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REMARKS 

This communication responds to the Office Action mailed on November 3, 2004. 
Claim 10 is amended, no claims are canceled, and no claims are added. As a result, 
claims 1-28 are now pending in this Application. The claims are reproduced in Appendix 
A for convenient reference by the Examiner. 

1. REAL PARTY IN INTEREST 

The real party in interest of the above-captioned Application is the Assignee, Intel 
Corporation. 

2. RELATED APPEALS AND INTERFERENCES 

There are no interferences or appeals known to Appellants, Appellants' legal 
representative, or the Assignee that will directly affect or be directly affected by or have a 
bearing on the Board's decision in an appeal in this matter. 

3. STATUS OF THE CLAIMS 

Claims 1-28 are currently pending in the Application. Claims 24-28 have been 
allowed. Objections have been raised with respect to claims 3-9, 11-13, 15, 19, 20, 22, 
and 23 as being dependent on rejected base claims. Claims 1-2, 10, 14, 16-18, and 21 
stand rejected, and the rejection is appealed herein. 
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4. STATUS OF AMENDMENTS 

No amendments have been made subsequent to the amendment to claim 28 for 
reasons unrelated to patentability in conjunction v^ith the Office Action Response filed on 
February 23, 2004. However, the Board is respectfully requested to consider the 
following amendment which provides clarity and consistency, and is not related to 
patentability: 

10. (Currently Amended) A method of preparing a circuit model for simulation, the 
circuit model having a model size, and the method comprising: 

merging a plurality of extended latch boundary components into a plurality of 
partitions having a partition size; and 

maintaining a load balance within the plurality of partitions. 
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5. SUMMARY OF CLAIMED SUBJECT MATTER 

This summary is presented in compliance with the requirements of Title 37 CF.R. 
§ 41.37(c)(l)(v), mandating a "concise explanation of the subject matter defined in each 
of the independent claims involved in the appeal . . Nothing contained in this summary 
is intended to change the specific language of the claims described, nor is the language of 
this summary to be construed so as to limit the scope of the claims in any w^ay. 

Some embodiments of the invention are related to a method of preparing a circuit 
model for simulation. The method may include decomposing the circuit model (having a 
number of latches) into a plurality of extended latch boundary components, and 
partitioning the plurality of extended latch boundary components. (Application, Claim 1; 
FIG. 2; pg. 2, lines 6-8 and pg. 3, line 20 - pg. 4, line 8). The method may also include 
merging a pluraUty of extended latch boundary components into a plurality of partitions 
having a partition size and maintaining a load balance v^ithin the plurality of partitions. 
(Application, Claim 10; FIG. 5; and pg. 6, line 27 - pg. 7, line 9). The method may also 
include grouping a plurality of extended latch boundary components into a plurality of 
partitions and reducing the communication time within the plurality of partitions by 
adjusting the grouping. (Application, Claim 14; FIG. 6; and pg. 7, lines 10-24). 

Some embodiments of the invention are related to a method of forming an 
extended latch boundary component. The method may include selecting a path having a 
first node selected from a group consisting of latches and primary outputs and a second 
node selected from a group consisting of latches and primary inputs, wherein the path can 
include a latch between the first node and the second node. (Application, Claim 16; FIG. 
1; and pg. 3, lines 5-19.) 

Some embodiments of the invention are related to a latch boundary component 
including a path comprising a plurality of first nodes selected from a group consisting of 
latches and primary outputs and a plurality of second nodes selected fi-om a group 
consisting of latches and primary inputs, where the path can include a plurality of latches 
between the plurality of first nodes and the plurality of second nodes. (Application, 
Claim 17; FIG. 1; and pg. 3, lines 5-19.) 
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Some embodiments of the invention are related to a method of sharing a repeated 
circuit structure in a circuit model. The method may include expanding the repeated 
circuit structure once to form an expanded circuit structure and grafting the expanded 
circuit structure to the circuit model as needed. (Application, Claim 18; FIG. 7; and pg. 
7, line 25 - pg. 8, line 5). 

Some embodiments of the invention are related to a method of simulating a circuit 
model. The method may include partitioning a plurality of extended latch boundary 
components to form a plurality of partitions having a size, preparing a plurality of 
simulations from the pluraUty of partitions, and executing the plurality of simulations on 
a processing unit. (Application, Claim 21; FIG. 9; and pg. 8, line 14 - pg. 9, line 2). 



6, GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 
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6.1 Claim 18 stands rejected under 35 USC § 102(b) as being anticipated by Beausang et al. 
(U.S. 5,828,579; hereinafter "Beausang-3"). [Note: the references are numbered to coincide with 
the designations assigned by the Examiner in the Office Action.] 

6.2 Claims 10, 16 and 17 stand rejected under 35 USC § 102(b) as being anticipated by 
Beausang et al. (U.S. 5,903,466; hereinafter "Beausang- 1"). 

6.3 Claim 14 stands rejected under 35 USC § 103(a) as being unpatentable over Beausang-1 in 
view of Beausang-3 . 

6.4 Claims 1-2 and 21 stand rejected under 35 USC § 103(a) as being unpatentable over 
Beausang-1 in view of Beausang et al. (U.S. 5,949,692; hereinafter "Beausang-2"). 



7. ARGUMENT 

7. 1 The Applicable Law 

Anticipation under 35 USC § 102 requires the disclosure in a single prior art reference of 
each element of the claim under consideration. See Verdegaal Bros. V. Union Oil Co, of 
California, 814 F.2d 628, 631, 2 USPQ 2d 1051, 1053 (Fed. Cir. 1987). It is not enough, 
however, that the prior art reference discloses all the claimed elements in isolation. Rather, 
"[ajnticipation requires the presence in a single prior reference disclosure of each and every 
element of the claimed invention, arranged as in the claim" Lindemann Maschinenfabrik GmbH 
V, American Hoist & Derrick Co., 730 F.2d 1452, 221 USPQ 481, 485 (Fed. Cir. 1984) (citing 
Connell v. Sears, Roebuck & Co., 122 F.2d 1542, 220 USPQ 193 (Fed. Cir. 1983)) (emphasis 
added). "The identical invention must be shown in as complete detail as is contained in the ... 
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claim." Richardson v. Suzuki Motor Co., 868 F.2d 1226, 1236, 9 USPQ2d 1913, 1920 (Fed. Cir. 

1989); MPEP § 2131 (emphasis added). 

The Examiner has the burden under 35 U.S.C. § 103 to establish a prima facie case of 

obviousness. In refine, 837 F.2d 1071, 1074, 5 U.S.P.Q.2d (BNA) 1596, 1598 (Fed. Cir. 1988). 

The M.P.E.P. contains explicit direction to the Examiner that agrees with the In re Fine court: 

In order for the Examiner to establish a prima facie case of obviousness, three 
base criteria must be met. First, there must be some suggestion or motivation, 
either in the references themselves or in the knowledge generally available to one 
of ordinary skill in the art, to modify the reference or to combine reference 
teachings. Second, there must be a reasonable expectation of success. Finally, 
the prior art reference (or references when combined) must teach or suggest all 
the claim limitations. The teaching or suggestion to make the claimed 
combination and the reasonable expectation of success must both be found in the 
prior art, and not based on applicant's disclosure. M.P,E,P. § 2142 (citing In re 
Vaeck , 947 F.2d 488, 20 U.S.P.Q.2d (BNA) 1438 (Fed. Cir. 1991)). 

The requirement of a suggestion or motivation to combine references in a prima facie case of 
obviousness is emphasized in the Federal Circuit opinion. In re Sang Su Lee, 211 F.3d 1338; 61 
U.S.P.Q.2D 1430 (Fed. Cir. 2002), which indicates that the motivation must be supported by 
evidence in the record. 

The test for obviousness under § 103 must take into consideration the invention as a 
whole; that is, one must consider the particular problem solved by the combination of elements 
that define the invention. Interconnect Planning Corp, v. Feil, 11 A F.2d 1 132, 1 143, 227 
U.S.P.Q. 543, 551 (Fed. Cir. 1985). References must be considered in their entirety, including 
parts that teach away from the claims. See MPEP § 2141.02. 

7.2 The References 

Beausang-1: discloses a scan insertion process for complex circuit synthesis that has a 
reduced set of constraint driven compiler optimizations. See Beausang-1, Col. 4, lines 46-48. A 
three-tiered performance optimization is used within the insertion process: the first tier optimizes 
design for test elements; the second tier optimizes all elements in the design (in addition to 
implementing the first tier); and the third tier performs sequential optimization, as well as local 
optimization (in addition to implementing the second tier). See Beausang-1, Col. 4, lines 50-59. 
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Beausang-2: describes a synthesis process of scan insertion/replacement and routing. 
See Beausang-2, Col. 31, lines 48-56. Scan resources may be inserted into an integrated circuit 
design organized into hierarchical modules. See Beausang-2, Col. 3, lines 52-54. 

Beausang-3: is directed toward a scan chain design database that defines a hierarchical 
circuit design as a netlist of logic cells, some of which contain scan structure. See Beausang-3, 
Col. 6, lines 47-66. Inferred scan structures may be recognized and replaced with explicit scan 
structures. See Beausang-3, Col. 7, lines 7-28. 



7.3 Discussion of the Rejections 

Z3.1 The Rejections Under S 102: 

Claim 18 was rejected under 35 USC § 102(b) as being anticipated by Beausang-3. 
Claims 10, 16 and 17 were rejected under 35 USC § 102(b) as being anticipated by Beausang-1. 
The Appellants do not admit that Beausang-1 or Beausang-3 are prior art, and reserve the right to 
swear behind these references at a later date. In addition, because the Appellants assert that the 
Office has not shown that Beausang-1 or Beausang-3 discloses the identical invention as 
claimed, the Appellants respectfully traverse these rejections of the claims. 

With respect to claim 18, it is noted that Beausang-3 is directed toward a scan chain 
design database that defines a hierarchical circuit design as a netlist of logic cells, some of which 
contain scan structure. See Beausang-3, Col. 6, lines 47-66. While the assertion is made in the 
Office Action that Beausang-3 Figs. 6A, 6B, and 7B illustrate various elements of the claim, it is 
respectfully noted that the cited elements are actually shown as part of creating balanced scan 
chains, and not sharing a repeated circuit structure in a circuit model, as claimed by the 
Appellants. 

FIGs. 6 A and 6B of Beausang-3 show before and after snapshots of synthesis results for 
two different chains having a single clock domain, while FIGs. 7A and 7B show before and after 
snapshots of synthesis results for two different chains having a mixed clock edge design. See 
Beausang-3, Col. 33, lines 5-50. The Appellants were unable to locate the activities of 
"expanding the repeated circuit structure" (there is no repeated structure shown) and "grafting 
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the expanded structure" (since there is no expanded, repeating structure) within the bounds of 
Beausang-3, as claimed by the Appellants. 

The Office attempts to explain similarities between Beausang-3 and the claimed 
embodiment by asserting that the insertion of Beausang-3 's latch 713 into level 712 (see 
Beasang-3, FIGs. lOA-lOB) represents "expansion of a repeated circuit structure." A similar 
assertion is made with respect to the activity represented by block 335 of Beausang-3 's FIG. 2A 
(where an unused chain is added to the scan plan - the unused chain being user-defined and 
having no scan structure associated with it). See Beausang-3, Col. 27, lines 15-23. However, 
neither of these assertions support the limitation of "expanding the repeated circuit structure once 
to form an expanded circuit structure" claimed by the Appellants, since no method of sharing a 
repeated circuit structure is shown. 

As to claim 10, it appears that an element claimed by the Appellants (i.e., "merging a 
plurality of extended latch boundary components into a plurality of partitions") was not 
addressed in the Office Action, and the Appellants were unable to find any indication of their 
existence in Beausang-1 . The Office attempts to explain that the mere presence of equal 
numbers of latch elements 307 and 321 in Beausang-1, FIG. 3B equates to "maintaining a load 
balance within the plurality of partitions", as claimed by the Appellants. However, this is not the 
case. As noted in Beausang-1, "a scan replacement process . . . replaces the non-scan memory 
cells 307a-e of unit 301 with scannable memory cells 320a-e (FIG. 3B) . . .". Thus, there is no 
load balance maintenance whatsoever; the process described by Beausang-1 is one of replacing 
one set of elements with another, and not balancing between partitions. See Beausang-1, Col. 9, 
lines 40-43. 

As to claim 16, it is respectfiilly noted that there is no indication that the circuit 
combination shown in FIG. 13B of Beausang-1 is a result of the claimed process (i.e., a method 
of forming an extended latch boundary component comprising "selecting a path having a first 
node selected from a group consisting of latches and primary outputs and a second node selected 
from a group consisting of latches and primary inputs"). 

As to claim 17, there do not appear to be either "primary inputs" (e.g., only a single 
primary input 307 is shown in FIG. 3 A) or "primary outputs" (none are shown) as set forth in the 



AMENDMENT & RESPONSE UNDER 37 CF.R. 1.1 16 - EXPEDITED PROCEDURE Page 10 

Serial Number: 09/347,690 Dkt: 884.107US1 (INTEL) 

Filing Date: July 2, 1999 

Title: LOGIC VERIFICATION IN LARGE SYSTEMS 

Assignee: Intel Corporation 

claim (i.e., a path comprising a plurality of first nodes selected from a group consisting of latches 

and primary outputs and a plurality of second nodes selected fi-om a group consisting of latches 

and primary inputs, where the path can include a plurality of latches between the plurality of first 

nodes and the plurality of second nodes"). 

A question regarding what may comprise a "primary input" or "primary output" was 

raised in the Office Action, These terms are well-understood by those of skill in the art of circuit 

synthesis. Numerous references make use of these terms. For example, as explained in 

Wireplanning in Logic Synthesis: 

"A logic circuit L is a 3 -tuple (I, O, F). I is a set of primary input pins, or simply 
primary inputs. O is a set of primary output pins, or simply primary outputs. 
Each element of I an 0 is a binary variable." Wireplanning in Logic Synthesis, 
ICCAD98, pg. 26 (attached hereto as part of Appendix B). 

Similarly, as noted in Logic Synthesis Preserving High-Level Specification: 

"Let S be a single output combinational circuit of multi-valued blocks specified 
by a directed acyclic graph H. The sources and the sink of H correspond to 
primary inputs and the output of S. Each non-source node of H corresponds to a 
multivalued block computing a multi-valued function of multivalued arguments. 
Each node of n of H is associated with a muhi-valued variable A. If n is a source 
of H, then the corresponding variable specifies values taken by the corresponding 
primary input of S. If n is a non-source node of S then the corresponding variable 
- describes the values taken by the output of the block specified by n. If n is a 
source (respectively the sink), then the corresponding variable is called a primary 
input variable (respectively primary output variable)." Logic Synthesis Preserving 
High-Level Specification, E.Goldberg (Cadence Berkeley Labs, USA), located at 
http://eigold.tripod.com/papers/iwls-2004.pdf (attached hereto as part of 
Appendix B). 

Finally, while it is asserted in the Office Action that Beausang-1 teaches the combination 
of "a plurality of first nodes (Figure 15A . . .), with a plurality of output latches (Figure 3B)," the 
Appellants can find no logical connection between these two elements within the bounds of 
Beausang-L The cited portions of this reference (FIGS. 13B, 15 A, and 3B) are directed to a 
scan insertion process to (i) move a load from a Q logic output to a g logic output to adjust the 
load input phase; (ii) move a load to a logically-equivalent driving input so the original driver 
can be downsized; and (iii) replace HDL-specified, non-scan memory cells with DFT scannable 
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memory cells. Beausang-1, CoL 6 line 15 - Col. 7, line 6; and Col. 24, line 58 - Col. 25, line 24. 
Thus, it does not appear that it is possible for the cited combination to exist according to the 
teachings of Beausang-1. 

"The identical invention must be shown in as complete detail as is contained in the ... 
claim." Richardson v. Suzuki Motor Co., 868 F.2d 1226, 1236, 9 USPQ2d 1913, 1920 (Fed. Cir. 
1989); MPEP § 2131 (emphasis added). Therefore, since what is disclosed by Beausang-1 and 
Beausang-3 is not identical to the subject matter of the embodiments claimed, the rejection of 
claims 10 and 16-18 under § 102(b) is improper. Reconsideration and allowance are respectfully 
requested. 

Z3.2 The Rejections Under S 103: 

Claims 1, 2 and 21 were rejected under 35 USC § 103(a) as being unpatentable over 
Beausang-1 in view of Beausang-2. Claim 14 was rejected under 35 USC § 103(a) as being 
unpatentable over Beausang-1 in view of Beausang-3. First, the Appellants do not admit that 
Beausang-1, Beausang-2, or Beausang-3 are prior art, and reserve the right to swear behind these 
references in the future. Second, since a prima facia case of obviousness has not been 
established in each case, the Appellants respectfully traverse these rejections. 

No proper prima facie case of obviousness has been established because (1) combining 
the references does not teach all of the limitations set forth in the claims, (2) there is no 
motivation to combine the references, and (3) combining the references provides no reasonable 
expectation of success. Each of these points will be explained in detail, as follows. 

Combining References Does Not Teach All Limitations: First, with respect to 
independent claims 1 and 21, no combination suggested in the Office Action will render all of 
the claim limitations. It is admitted by the Office that Beausang-1 does not disclose "partitioning 
the plurality of extended latch boundary components" as claimed by the Appellants. Neither 
does Beausang-2. 

Beausang-2 is directed to a synthesis process of scan insertion/replacement and routing. 
See Beausang-2, Col. 31, lines 48-56. The Appellants were unable to find any reference within 
the bounds of Beausang-2 to partitioning "extended latch boundary components." Rather, the 



AMENDMENT & RESPONSE UNDER 37 C.F.R. 1. 116 - EXPEDITED PROCEDURE Page 12 

Serial Number: 09/347,690 Dkt: 884.107US1 (INTEL) 

Filing Date: July 2, 1999 

Title: LOGIC VERIFICATION IN LARGE SYSTEMS 

Assignee: Intel Corporation ™_ 

figures cited in the Office Action (Beausang-2, FIGs. 2 A and 6A-14B) illustrate various portions 
of scan chains; partitioning extended latch boundary components to serve any of the purposes 
noted by the Appellants in the Application is not taught. The text cited in the Office Action 
(Beausang-2, Col. 26, line 50 - Col. 27, line 17) refers to selecting "partition blocks" to balance 
scan chains, and not to parititioning extended latch boundary components. The logic cells of the 
database remain unaltered. Thus, independent claims 1 and 21 are nonobvious. This conclusion 
applies with even greater force respecting dependent claim 2, since any claim depending from a 
nonobvious independent claim is also nonobvious. See M.P.E.P. § 2143.03. 

Second, with respect to independent claim 14, the Office admits that Beausang-1 does not 
disclose "grouping a plurality of extended latch boundary components into a plurality of 
partions", as claimed by the Appellants. Neither does Beausang-3. 

Beausang-3 is also directed to a synthesis process of scan insertion/replacement and 
routing. See Beausang-3, Col. 32, lines 22-42. The Appellants were unable to find any 
reference within the bounds of Beausang-3 to "grouping a plurality of extended latch boundary 
components into a pluraUty of partions." Rather, the cited segments of Beausang-3 (Beausang-3, 
Col. 12, Unes 45-55) serve to define scan groups, scan links, and scan chains; grouping extended 
latch boundary components to serve any of the purposes noted by the Appellants in the 
Application is not taught. Thus, independent claim 14 is nonobvious. 

No Motivation to Combine References: The Office asserts that one would be motivated 
to combine Beausang-1 with Beausang-2 or Beausang-3 so that a designer would be able to "sign 
off his or her work at the completion of module design, without later disruption. See Office 
Action, Paper 10-24-04, Pg. 8. However, as pointed out in the Office Action, Beausang-1 
already provides an enabling solution. 

The assertion is made by the Office that, nevertheless, one of skill in the art "would have 
been motivated to find a better solution to decrease the amount of time required to compile a 
design." It is further asserted that it would therefore have been obvious to combine Beasang-1 
with either Beausang-2 or Beausang-3 so that "the IC designer can now 'sign off his or her work 
at the completion of the module design" so that completed/optimized modules do not later have 
to be disrupted. However, the Appellants do not understand that enabling designer sign off 
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necessarily satisfies the motivation noted in the Office Action. In fact, the combination may 
require extra time to finalize individual module designs, potentially creating a longer 
compilation time, since each module must now be perfected by its individual designer prior to 
top level analysis. See Beausang-2, Col. 3, lines 3-11 and Beausang-3, Col. 3, lines 4-12. Thus, 
there is no motivation to combine Beausang-1 w^ith either Beausang-2 or Beausang-3. 

Finally, it is noted in the Office Action that all three references have the same inventor, 
James Beausang, and that this somehow constitutes a motivation to combine them. However, 
combining references in a way that prevents achieving the goals expressed in the references does 
not constitute a motivation to combine them. 

Since Beausang-1 teaches away from the suggested combinations, the use of unsupported 
assertions in the Office Action does not satisfy the explicit requirements needed to demonstrate 
motivation as set forth by the In re Sang Su Lee court. Therefore, the Examiner appears to be 
using personal knowledge, and is again respectfully requested to submit an affidavit as required 
by 37 CF.R. § 1.104(d)(2). 

No Reasonable Expectation of Success: As has been previously noted, modifying 
Beausang-1 to implement independent completion of modules by various designers may create 
additional barriers to overall design completion, without providing a "system that can reduce the 
time required to perform circuit synthesis ..." See Beausang-1, Col. 4, lines 18-19. Introducing 
a human element into an electronic design process rarely speeds anything up; in fact, "[d]esign, 
checking and testing of large scale integrated circuits are so complex that the use of programmed 
computer systems are required for reaUzation of normal circuits." See Beausang-1, Col. 1, lines 
21-23. 

In addition, as noted above, several elements of claims 1-2, 14, and 21 are not provided 
by any of the cited references. Thus, there is no reasonable expectation that any combination of 
Beausang-1, Beausang-2, and Beausang-3 will be unable to provide the missing elements, such 
as "decomposing", "partitioning", and "grouping" extended latch boundary components, as 
claimed by the Appellants. 

The test for obviousness under § 103 must take into consideration the invention as a 
whole; that is, one must consider the particular problem solved by the combination of elements 
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that define the invention. Interconnect Planning Corp. v. Feil, 11 A F.2d 1 132, 1 143, 227 
U.S.P.Q. 543, 551 (Fed. Cir. 1985). References must be considered in their entirety, including 
parts that teach away from the claims. See MPEP § 2141 .02. The fact that references can be 
combined or modified does not render the resultant combination obvious unless the prior art also 
suggests the desirability of the combination. In re Mills, 16 USPQ2d 1430 (Fed. Cir. 1990); 
M.P.E,P,§ 2143.01. 

Therefore, since there is no evidence in the record to support disclosure by either 
Beausang-1, Beausang-2, or Beausang-3 of "decomposing", "partitioning", and "grouping" 
extended latch boundary components, since there is no motivation to supply the missing 
elements (since the references teach away from such a combination), and since no reasonable 
expectation of success arises, a prima facie case of obviousness has not been established with 
respect to independent claims 1,14, and 21 . This conclusion also applies to dependent claim 2, 
since any claim depending from a nonobvious independent claim is also nonobvious. It is 
therefore respectfully requested that the rejections of claims 1-2, 14, and 21 under 35 U.S.C. § 
103 be reconsidered and withdrawn. 
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8. SUMMARY 

It is respectfully submitted that no prima facie case of anticipation under 35 U.S.C. §102, 
nor of obviousness under 35 U.S.C. §103 has been established by the Office. Therefore, it is 
respectfully requested that the rejections of claims 1-2, 10, 14, 16-18, and 21 be reconsidered and 
withdrawn. The Appellants respectfully submit that all of the claims are in condition for 
allowance and notification to that effect is earnestly requested. The Examiner is invited to 
telephone the Appellants' attorney, Mark Muller at (210) 308-5677, or the undersigned attorney 
at (612) 373-6970, to facilitate prosecution of this Application. If necessary, please charge any 
additional fees or credit overpayment to Deposit Account No. 19-0743. 



Respectfully submitted, 
MANPREET S. KHAIRA ET AL. 
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APPENDIX A - CLAIMS 



1 . (Original) A method of preparing a circuit model for simulation comprising: 

decomposing the circuit model having a number of latches into a plurality of extended 

latch boundary components; and 

partitioning the plurality of extended latch boundary components. 



number of latches into a plurality of extended latch boundary components comprises: 

decomposing at least one of a plurality of hierarchical cells into one of the plurality of 
extended latch boundary components. 

3. (Original) The method of claim 2, wherein partitioning the plurality of extended latch 
boundary components comprises: 

using a constructive bin-packing heuristic to partition the plurality of extended latch 
boundary components. 

4. (Original) The method of claim 3, wherein using a constructive bin-packing heuristic to 
partition the plurality of extended latch boundary components comprises: 

constructing a plurality of seeds from the plurality of extended latch boundary 
components; and 

merging the plurality of extended latch boundary components with the plurality of seeds. 

5. (Original) The method of claim 1, wherein decomposing a circuit model having a 
number of latches into a plurality of extend latch boundary components comprises: 

identifying an extended latch boundary component that meets a size constraint for at least 
one of a plurality of hierarchical cells. 



(Original) The method of claim 1, wherein decomposing a circuit model having a 



6. (Original) The method of claim 5, wherein partitioning the plurality of extended latch 
boundary components comprises: 
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grouping the plurality of extended latch boundary components into a plurality of 
partitions by approximately equalizing the number of latches in each of the plurality of 
partitions. 

7. (Original) The method of claim 1, wherein partitioning the plurality of extended latch 
boundary components comprises: 

grouping the plurality of extended latch boundary components to form a plurality of 
partitions, each of the plurality of partitions having a size. 

8. (Original) The method of claim 7, v^herein partitioning the plurality of extended latch 
boundary components comprises: 

partitioning the plurality of extended latch boundary components by approximately 
equaUzing the number of latches in each of the plurality of partitions, approximately equalizing 
the latches that are activated in each of the plurality of partitions, and approximately equalizing 
the size of each of the plurality of partitions. 

9. (Original) The method of claim 1, wherein partitioning the plurality of extended latch 
boundary components comprises: 

attempting to partition the plurality of extended latch boundary components based on 
activity load balancing. 

10. (Currently Amended) A method of preparing a circuit model for simulation, the circuit 
model having a model size, and the method comprising: 

merging a plurality of extended latch boundary components into a plurality of partitions 
having a partition size; and 

maintaining a load balance within the plurality of partitions. 

1 1 . (Original) The method of claim 10, further comprising: 
reducing circuit overlap within the plurality of partitions. 

12. (Original) The method of claim 11, further comprising: 

adjusting the load balance to obtain a partition size of less than about 110% of the model 

size. 
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13. (Original) The method of claim 12, further comprising: 

adjusting the load balance to obtain a partition size of less than about 120% of the model 

size. 

14. (Original) A method of preparing a circuit model for a simulation having a total 
simulation time, the method comprising: 

grouping a plurality of extended latch boundary components into a plurality of partitions; 

and 

reducing the communication time within the plurality of partitions by adjusting the 
grouping. 

15. (Original) The method of claim 14, further comprising: 

reducing the communication time within the plurality of partitions to less than about ten 
percent of the total simulation time by adjusting the grouping. 

16. (Original) A method of forming an extended latch boundary component comprising: 
selecting a path having a first node selected from a group consisting of latches and 

primary outputs and a second node selected from a group consisting of latches and primary 
inputs, wherein the path can include a latch between the first node and the second node. 

17. (Original) A latch boundary component comprising: 

a path comprising a plurality of first nodes selected from a group consisting of latches 
and primary outputs and a plurality of second nodes selected from a group consisting of latches 
and primary inputs, where the path can include a plurality of latches between the plurality of first 
nodes and the plurality of second nodes. 

18. (Original) A method of sharing a repeated circuit structure in a circuit model, the method 
comprising: 

expanding the repeated circuit structure once to form an expanded circuit structure; and 
grafting the expanded circuit structure to the circuit model as needed. 
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19. (Original) The method of claim 18, wherein grafting the expanded circuit structure to the 
circuit model as needed comprises: 

copying a table representing the expanded circuit structure into the circuit model. 

20. (Original) The method of claim 18, wherein grafting the expanded circuit structure to the 
circuit model as needed comprises: 

altering a table representing the circuit model to add the expanded circuit structure. 

21 . (Original) A method of simulating a circuit model, the method comprising: 
partitioning a plurality of extended latch boundary components to form a plurality of 

partitions having a size; 

preparing a plurality of simulations from the plurality of partitions; and 
executing the plurality of simulations on a processing unit. 

22. (Original) The method of claim 21, further comprising: 
adjusting the size of the plurality of partitions. 

23. (Original) The method of claim 22, wherein executing the plurality of simulations on a 
processing unit comprises: 

executing the plurality of simulations on a plurality of distributed processors. 

24. (Original) A computer system comprising: 
a processor unit; 

a dicing unit operably coupled to the processor unit, capable of executing on the 
processor unit, and capable of decomposing a circuit model into a plurality of extended latch 
boundary components, and capable of partitioning the plurality of extended latch boundary 
components; and 

a simulation unit operably coupled to the dicing unit and the processor unit, and capable 
of executing on the processor unit. 

25. (Original) The computer system of claim 24, wherein the processor unit is a plurality of 
distributed processor units. 
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26. (Original) The computer system of claim 25, wherein the dicing unit is capable of load 
balancing. 

27. (Original) The computer system of claim 26, wherein the dicing unit is capable of 
activity load balancing. 

28. (Previously Presented) A computer-readable medium having computer-executable 
instructions, wherein the computer-executable instructions, when accessed, result in a machine 
performing: 

partitioning a circuit model into a plurality of cells arranged in a hierarchy; and 
mapping a plurality of extended latch boundary components into the circuit model by 
finding each cell in the plurality of cells that is highest in the hierarchy such that a single 
extended latch boundary component satisfying a given size constraint can be mapped into the 
cell. 
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APPENDIX B - EVIDENCE 



INSERT #1: WIREPLANNING IN LOGIC SYNTHESIS, 8 pgs. 

INSERT #2: LOGIC SYNTHESIS PRESERVING HIGH-LEVEL SPECIFICATION, 8 
pgs. 
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Logic synthesis preserving high-level specification 



E.Goldberg (Cadence Berkeley Labs, USA), 



Abstract, in this paper we develop a method of logic synthesis 
that preserves high-level structure of the circuit to be 
synthesized. This method is based on the fact that two 
combinational circuits implementing the same "high-level" 
specification can be efficiently checked for equivalence. Hence, 
logic transformations preserving a predefined specification can 
be made efficiently. We introduce the notion of toggle equivalence 
of Boolean functions and show that toggle equivalence can be 
used for making gate level transformations that preserve a 
predefined specification. We describe a practical procedure for 
checking toggle equivalence of two Boolean circuits and give 
experimental data about its performance. 



1. Introduction 

In a typical design flow, by the time a circuit is passed 
to a logic synthesis procedure, the "high-level" structure of 
this circuit is lost. Suppose, for example, that a 
combinational circuit is initially described as a network of 
multi-valued blocks. After encoding all the multi-valued 
variables, this network is replaced with a Boolean circuit 
that is optimized using a set of local transformations that 
ignore the original high-level structure of the circuit. 

The flaw of the approach above is that in the case of a 
poor choice of encodings for multi-valued variables, a 
synthesis procedure using local transformations will not be 
able to "correct" these encodings. On the other hand, 
finding good encodings at the level of multi-valued blocks 
is hard and so the probability of generating bad encodings 
is high. A possible solution to the problem is to perform 
logic transformations that re-encode multi-valued variables 
"implicitly" at the gate level. We will call such 
transformations High-Level structure aware Logic 
Synthesis (HLLS)), This is because re-encoding of multi- 
valued variables implicitly, essentially means synthesizing a 
circuit that is a different implementation of the same 
"specification" as the original circuit. An HLLS procedure 
can be used as an extra optimization step taken before using 
logic synthesis based on local transformations. 

In this paper we introduce a method of HLLS for 
combinational circuits. To design an HLLS procedure one 
has to solve two problems. The first problem is to verify the 
correctness of logic transformations. A "regular" logic 
synthesis procedure makes local transformations so the 
equivalence checking of the original and optimized circuits 
is usually not an issue. On the other hand, an HLLS 
procedure performs "non-local" synthesis transformations, 
so verification of the correctness of such transformations 



may pose a problem. This problem was addressed in [3], [4] 
where it was shown that if two Boolean circuits have a 
common specification (CS), their equivalence checking is 
"easy". Informally, circuits A^i and Ni have a CS if they can 
be considered as two different implementations of a circuit 
S of multi-valued gates further referred to as blocks. (S is 
called a CS of A^i and A^2)- In [3][4] it was proven that 
given a CS of circuits A^i and A^2j there is an equivalence 
checking procedure whose complexity is linear in the 
number of blocks of S and exponential in the granularity of 
S (the "size" of the largest block of S). An example of 
circuits M,A^2 having a common specification of three 
blocks is shown in Fig. 1 . The specification itself is shown 
on the left. Here A^iCGk) and NiiGk) are subcircuits of A^i 
and A^2 respectively implemenfing the same block of 
specification. 



TT 



Figure 1 Circuits A^i and A^2 with a common specification of 
three blocks 

The second problem one has to solve in HLLS, is to 
find a way to preserve a predefined specification by 
making transformations at the gate level. Solving this 
problem is the focus of this paper. We introduce the 
notion of toggle equivalence of multi-output Boolean 
functions that is a generalization of regular functional 
equivalence. We show that circuits A^i and A^2 have a CS if 
they can be partitioned into toggle equivalent subcircuits 
(see Section 4.). This result suggests a simple way to 
perform HLLS. Suppose that a specification S of circuit A^i 
to be optimized is specified by partitioning the latter into 
subcircuits. Then, if we replace subcircuits of A^i with 
"better" subcircuits that are toggle equivalent to replaced 
ones, we produce a circuit A^2 that implements the same 
specification 5" as A^i. To be viable, HLLS needs an efficient 
algorithm for checking toggle equivalence. We describe 
such an algorithm and demonstrate its efficiency on 
benchmark circuits . 

The paper is structured as follows. In Section 2 we 
formally define the notion of a common specification. In 



Section 3 we reformulate the algorithm of equivalence 
checking from [3] removing some redundancy. Section 4 
introduces the notion of toggle equivalence of Boolean 
functions. The relation between the notions of toggle 
equivalence and common specification is shown in Section 
5. In Section 6 we describe a method of HLLS. The 
relation of HLLS to other synthesis procedures is discussed 
in Section 7. The description of a procedure for checking 
toggle equivalence and its performance on benchmark 
circuits are given in Section 8. Finally, we draw some 
conclusions in Section 9. 



2. Definition of common specification 

In this section, we formally define the notion of a 
common specification of Boolean circuits. Let 5 be a single 
output combinational circuit of multi-valued blocks 
specified by a directed acyclic graph H. The sources and 
the sink of // correspond to primary inputs and the output of 
S. Each non-source node of H corresponds to a multi- 
valued block computing a multi-valued function of multi- 
valued arguments. Each node of n of // is associated with a 
multi-valued variable A.lf n \s 2l source of H , then the 
corresponding variable specifies values taken by the 
corresponding primary input of S. If « is a non-source 
node of S then the corresponding variable describes the 
values taken by the output of the block specified by n. If n 
is a source (respectively the sink), then the corresponding 
variable is called a primary input variable (respectively 
primary output variable). We will use the notation 
C^GiAxAi,"', ^k) to indicate that a) the output of a block 
G is associated with a variable C; b) the function computed 
by the block G is GiAxAiy", c) only A: nodes of //are 
connected to the node n in H and outputs of these nodes 
are associated with variables v4 1^2,- ■ ^it- 
Denote by D(A) the domain of the variable A 
associated with a node of H. The value of \D(A)\ is called 
the multiplicity of A. If the multiplicity of every variable /4 
of S is equal to 2 then iS is a Boolean circuit. 

Now we introduce the notion of a specification of a 
single output Boolean circuit N. Informally, a multi-valued 
circuit 5 is a specification ofNifN can be obtained from S 
by picking proper encodings of internal variables of S. In 
the following exposition any multi-valued network is called 
a specification. 

Definition L Let D(v^)={a,,. be the domain of a 
variable A of S. Denote by q(A) a Boolean encoding of the 
values of D(A) which is a mapping ^:D(^)->{0,I such 
that a.:Aa.^ q(a^) -t- q(a.). The value of q(a.X a. e D{A) is 
called the code of a-. Denote by length{q{A)) the number of 
bits in q{a.) i.e. the value of m. Denote by v(A) the set of m 
coding Boolean variables. 



In this paper, we make the assumptions below about 
specifications and implementations 

Assumption L A specification contains only one output. 
All primary input variables and the primary output variable 
of a specification are Boolean. 

Assumption 2. Every gate of a Boolean circuit 
(implementation) has two inputs and one output. Every 
multi-valued block of a specification has only one output 
but the number of inputs of a block is not fixed. 
Assumption 3. If A\ and A-^ are two different variables of a 
specification , then v{A-^ n v{A-^ =0. 
Definition 2. Let C^G{A,4^,..^^ be a block of 
specification S, Let q{A^),..,q{AX9{Q be encodings of 
variables A^^^,...4^ and C respectively. A Boolean circuit 
is said to implement the block C if N implements the 
completely specified Boolean function 

q{C)^G{q{A^),.,,q{Ay) whose truth table is obtained fi-om 
that of G by replacing values ofAy.,A^,C with their codes. 
Definition 3. Let 5* be a multi-valued circuit. A single 
output Boolean circuit A'^ is said to implement the 
specification S, if A'^ can be built from S by the following 
two rules. 

1) Each block G of S is replaced with its implementation 
(denote it byA^(G)). 

2) Let the output of block Gx (specified by variable C) be 
connected to an input of block G2 (specified by the same 
variable C) in S. Then the outputs of the circuit N(Gi) are 
properly connected to inputs of N^Gt)- Namely, if a primary 
output of N(Gx) connected to an input of N(G2) these input 
and output are specified by the same coding variable of 
v(Q 

Remark 1. It is important to emphasize that the fact that a 
circuit N has a specification S does not necessary mean that 
A'' is produced fi-om S by encoding it multi-valued 
variables. It just means that N can be produced from S. 
Remark 2. Let A^ be an implementation of a specification 
S. Let p be the largest number of gates used in an 
implementation of a multi-valued block of 5 in A^. We will 
say that 5 is a specification of granularity p for A^. 
Definition 4, Let A^„ be two functionally equivalent 
single output Boolean circuits. Let A^,, A^^ implement a 
specification S. Then S is called a common specification 
(CS) ofA^, andA^,. 

Definition 5, Let S be a CS of A^,,A^2. Let /?, (respectively 
p^) be the granularity of S with respect to A^, (respectively 
N^). Then we will say that 5 is a CS of A^pA^^ of granularity 

p ^max(p,,pX 

3. Equivalence checking with a known 
common specification 

In this section, we recall the equivalence checking 
algorithm of [3] and give a slightly modified version of it. 



Let N\ and Nj be Boolean circuits with a CS S. Let Gk be a 
block of S, We will denote by A^iCG^) and A^2(<^k) the 
implementations of the block Gk in Ni and A^2 respectively. 
Definition 6. Let 5 be a CS of N, and A^^. The topological 
level of a block in a specification S is the length of the 
longest path from a primary input of S to G^. (The length of 
a path is measured in the number of blocks on it. The 
topological level of a primary input is assumed to be 0.) 
Denote by level(GJ the topological level of G^ in S. Denote 
by /eve/(yVi(Gk)) the topological level of implementation of 
block Gk in A^j, /=1,2 that is assumed to be equal to 
/eve/(Gtc). 

Defmition 7. Let N^G^) and M^(G^) be implementations of 
a multi-valued block G^ whose output is associated with 
variable G. Let ^,(G) and q^(Q be encodings of the variable 
C used in implementations A^,(*^k) ^"^ ^li^v)- function 
C/(v,(Q,V2(C)) is called a correlation function of 
encodings q^{Q,q^(C) if 

a) CJ[z^y z^)='\ for any assignment 2, to v,(Q and to v^CQ 
such that z^=q^(c) and z=qj^c) where c e D(Q, 

b) Otherwise C/(^p ^i)^^- 

Definition 8. Let be a Boolean circuit. Denote by v{N) be 
the set of Boolean variables associated with the output of 
gates of . Denote by Sat(y(N)) the Boolean function such 
that Sat{z)=\ iff the assignment z to variables v(AO is 
"possible" i.e consistent. For example, if circuit A^ consists 
of just one AND gate y=x^ a x^, then v(N)={y,x^rX^} and 
Sat(viN))= V -X, vy)A (x, v -y) a {x, v -y). 
Defmition 9. Let /be a Boolean function. We will say that 
function f is obtained from / by existentially quantifying 
away variablex ify* =y(...;c=0,...) v y(...^=l,. ...). 

In [3] it was shown that if A'^i and N2 have a CS S, one 
can check them for equivalence in the time linear in the 
number of blocks of S and exponential in the granularity of 
S. The essence of that algorithm is to compute so-called 
filtering and correlation functions in topological order. Here 
we give a modified version of this algorithm. The 
modification is that we discard computation of filtering 
functions from the algorithm. 

Here is an informal proof that computation of filtering 
fiinctions is not necessary. Let C be the variable associated 
with the output of a block Gk of S, From definitions of 
filtering (denoted by Ff) and correlation functions given in 
[3] it follows that 

Fy{v,(0) A FJ{V2(Q) A Q(V,(C),V2(C))= C/V,(C),V2(C)). 

So filtering functions can be dropped from the proof of 
Proposition 7 of [3] used to formulate the main result i.e. 
Proposition 8. (The use of filtering functions makes sense 
though, if one relaxes the definition of a correlation 
function. However, in this paper we stick to the definition 
of a correlation fiinction given in [3].) 



In the modified algorithm, only correlation functions 
are computed in topological order of A^i and A^2- The 
algorithm starts with block implementations A^i(Gk),A^2(^k) 
of level 1 then process implementations of level 2 and so 
on. Let /eve/(A^i(Gk))=level(A^2(Gk))=i (i.e. inputs of Af,(Gk) 
and A^2(Gk) are primary inputs of N\ and A^2') Let G be the 
variable associated with the output of Gk and 
Q(vi(G),v2(Q) be the correlation function relating 
encodings q\(Q and qiiQ- This function is obtained from 
the function 5a/(v(A/i(Gk)) a 5a/(v(A^2(G'k)) by existentially 
quantifying away all the variables except the variables 
associated with the outputs of N\(G^) and A^2(<^k)- 

Suppose level(Ni{Gi,))^level(N2(G^)=k and the 
correlation functions have been computed for the 
implementations of levels less than k. Let the output of Gk 
be associated with variable C^. Let the inputs of Gk be 
connected to the outputs of blocks Gki,...,Gkni associated 
with variables Gki,..,Gkm respectively. Then the correlation 
function C/(vi(Gk),V2(Gk)) is obtained from the function 
CAvt(Gk,),V2(Gk,)) A A Gy(v,(GkJ,V2(Gkn,)) A Sat(v(N,(G,)) 
A 5c//(v(A^2(Gk))) by existentially quantifying away all the 
variables except ones associated with the outputs of A^i(Gk) 
and A^2(<^k)- Eventually the correlation function Cfifi/2) is 
computed where f\ and fi are Boolean variables specifying 
the outputs of A^i and A^2. If CM/t) = (/i^ -fi) a (-/iv/2) 
(which is the equivalence function), then circuits A^i and A^2 
are functionally equivalent. 

Definition 10. Given two functionally equivalent Boolean 
circuits A^,, A^^* ^ called the finest common specification 
if it has the smallest granularity p among all the CSs of N^ 
and 

The complexity of the described algorithm is 
exponential in the granularity of S (i.e. in the size of the 
maximal subcircuit M(Gk)). So to evaluate the complexity 
of equivalence checking of Ny and A^2 correctly, one needs 
to know the finest CS of A'^t and A^2 or a CS whose 
granularity is close to that of the finest one. 

4. Toggle equivalence of Boolean functions 

In this section, we introduce the notion of toggle 
equivalence. We also show that toggle equivalent Boolean 
functions can be considered as different implementations of 
the same multi-valued function. 

Definition 11. Let /,:{0,1}" ^ {O,!}" and /, {0,1}" ^ 
{0,1}'' be /M-output and /r-output Boolean fiincfions of the 
same set of variables. Functions /, and f are called toggle 
equivalent if f(x) ^Ux) <^f,{x) i^fpc'). Circuits A^, and 
implementing toggle equivalent functions / and f are 
called toggle equivalent circuits. 



Remark 3. Toggle equivalence means that for any pair of 
input vectors jc, x' for which at least one output of f\ 
"toggles", the same is true for fi and vice versa. 
Dennition 12, Let /be a multi-output Boolean function of n 
arguments. Denote by Part{f) the partition of the set {0, 1 }" 
into blocks B^,...,B^ such thatj{x) ^J{x') if jc, jc'are in the 
same block mdj{x) ^fix') if jc, jc 'are in different blocks. 
Proposition 1. Let / and / be toggle equivalent. Then 
Part{f^)=Part{f^ i.e. for each block B, of Part{f^) there is a 
block B'. of Part{Q such that 5=5^ and vice versa. 
Proof. Assume the contrary i.e. Part{fx) t- Part{f^. Then 
there is a block B\ of Part(fi) such that no block B] of 
Partifi) is equal to By. Then only the two cases below are 
possible. 

a) B, contains at least two input vectors. Then there is 
a pair of vectors x,x* of block B\ such that they are in 
different blocks of Part{f{). This means that f\{x)=fx{x') 
while fi{x) ^ fi{x') i.e fx and fi are not toggle equivalent. 
So we have a contradiction. 

b) B\ contains only one input vector x. Let of 
Partifi) contain vector x. Block 5'j also contains at least 
one more input vector because otherwise BfB'y So the 
block contains two input vectors that are in different 
blocks of partifi). Hence f and fi are not toggle equivalent 
and we again have a contradiction □ 

Proposition 2. Let / and / be toggle equivalent single 
output Boolean functions. Then f=f or f=^f where 
means negation. 

Proof, From Proposition 1 it follows that Part{f\)=Part{f{). 
Since /i, fi are Boolean functions, Partifx) and Partifi) 
contain two blocks each. So the only two alternatives are 
fi^fi or/i=-/2. 

Proposition 3. Let /, and / be two multi-output Boolean 
functions of n Boolean variables such that Far/(/,)=Part(/j). 
Then/ and / are toggle equivalent. 

Proof can be performed by reasoning in the same way as in 
the proof of Proposition 1 . 

Proposition 4. Let F be a multi-valued function of n 
Boolean variables. Let C be the multi-valued variable 
specifying the output of F. Let / and / be Boolean 
functions obtained from F by using encodings q^ and (of 
possibly different length) for the values of C Then / and / 
are toggle equivalent. 

Proof According to Definition 1 different values of C are 
assigned different codes. So Part(fx)^Part{f^=Pari{F) 
where Par/(F)={5|,. .. A} and B\ consists of all the input 
vectors for which F takes the same value of C From 
Proposition 3 it follows that / and / are toggle equivalent. 

Propositions. Let / and / be toggle equivalent. Then/ 
and / are two different "implementations" of the same 
multi-valued function of Boolean variables. 



Proof According to Proposition 1 Part{f\)=Part{fi). Let 
Part{fx),Part{fi) contain k blocks. Then / and /j are 
implementations of the function F\ {0,1}" -> {1,.-,^'} where 
F{x)~my iff a: is in the w-th block of Part{f\), 

So far we have considered toggle equivalence of 
functions with identical sets of arguments. Below, the 
notion of toggle equivalence is extended to the case of 
Boolean circuits with different sets of arguments that are 
related by encoding functions. 

Definition 13. Let {jc,,...^J and Y= {y,,..., J'J be two 
disjoint sets of Boolean variables. Denote by Enc{X,Y} a 
Boolean function satisfying the following two conditions 

1) . There do not exist three vectors jc, x\ y (where jc, jc' are 
assignments to variables X and y is an assignment to 
variables Y ) such that jc jc' and Enc{x, y)=Enc{x\ y)= 1 . 

2) There do not exist three vectors x,y,y' such that y^y 
and Er\c{x^)^Enc{x, y')=\. 

The function Enc is called an encoding function. 

Remark 4. An encoding function can be viewed as 
specifying two different encodings of the same multi-valued 
variable. Indeed, let F={jCi,. ..,jCk} be the set of all 
assignments to variables of X such that Enc{x\, y)=\ for 
some y. Let W={yu.., y^} be the set of all assignments to 
variables of Y such that Enc{x, y{) = 1 for some jc. It is not 
hard to see that from Definition 13, it follows that |fF|=|P^ 
and there exists a "natural" bijective mapping between X 
and Y that relates a vector jci of F and the vector y^o^W for 
which Enc{x,, y-^= 1 . Vectors JCj and y^ can be considered as 
codes of the same value of a /:-valued variable. 
Dennition 14. Let/:{0,ir ^ {0,1}" and / {0,1}^ -> 
{0,1 }*" be /w-output and A:-output Boolean functions and sets 
X={x„...,jcJ and Y={y^,...yy^} specify their arguments. Let 
X={X^,..^;\ and y={y„...,FJ be partitions ofXand rinto^ 
blocks and Enc{X^J^),..., Enc{X^,Y) are encoding functions. 
Functions / and / are called toggle equivalent under input 
encoding function Enc{X,Y)=Enc{Xy,Y^ a ... AEnc{X^J^ 
if ifm^fM') A {Enc{xy)-Enc{x\f)=\))^{ffy)^ 
fiy') and vice versa (f^iy) ^ fiy') a {Enc{x^)=Enc{x\ 

yl-\))^m^f{xr 

Proposition 6. Let / and / be toggle equivalent under 
input encoding function £/ic(A",y)=Enc(X,,y,) a ... 
Enc{X^J^ . Then / and / are "implementations" of the 
same multi-valued function of s multi-valued arguments. 
Proof follows from Proposition 5 and Remark 4. 

5, Common specification and toggle 
equivalence 

In this section, we show that the existence of a CS of 
circuits A^i and Ni means that Ny^Ni can be partitioned into 



toggle equivalent subcircuits that are connected in A^i and 
A^2 "in the same way". 

Definition 15. Let N = be a DAG representing a 

Boolean circuit (here V,E are sets of nodes and edges of 
respectively.) A subgraph N*^(V*, E*) of is called a 
subcircuit if the following two conditions hold: 

a) if gpgj are in and there is a path from g, to in N, 
then all the nodes of that on that path are in F* ; 

b) if g„g2 of F* are connected by an edge in N, then they are 
also connected by an edge in A^*. 

Definition 16. Let A^* be a subcircuit of N. An input of a 
gate g of A^* is called an input of A^ if it is not connected to 
the output of some other gate of A^. A gate g of A^* is called 
an internal gate if all the gates of A^ whose inputs are fed by 
the output of g are in A^*. Otherwise, g is called a/i external 
gate. The output of an external gate is called an output of 
circuit A^*. 

Definition 17. Let A^* be a subcircuit of A^ of k inputs and p 
outputs. Let A^ be the circuit obtained from A^ by replacing 
A^* with a /:-input, p-output node that inherits all the 
connections of subcircuit A^* to the gates of A^ that are not in 
A^*. We will say that A^ is obtained from A^ by collapsing 
the subcircuit N*. 

Definition 18. Let {ht, ...,ht} be a partition of A^ into 
subcircuits. This partition is called topological if for each 
pair of subcircuits N^N the following condition holds: if 
there is node g' of A^j and node g" of N. that are connected 
by a path in A^ and level(g') < level{g'% then for any pair of 
nodes g* and g** (where g* is in A^ and g** is in A^) it is 
true that level(g*) < level(g**) {level(g) is the level of g in 

AO- 
Definition 19. Let A^be a Boolean circuit and A^, A^ be 
a topological partition of A^ into subcircuits. Let T be a 
directed graph obtained from the DAG of A^ using the 
following steps: 

1) Each subcircuit V is collapsed in A^ (i.e. replaced with a 
node G. with the same number of inputs and outputs as A^ ). 

2) The outputs of each node G. are merged into one. If two 
(or more) outputs of G. are connected to inputs of node G^, 
then all the inputs of G^ connected to G. but one are 
removed. 

T is called the communication specification 
corresponding to the topological partition V, A^ . 

Remarks. Informally, T describes information flow 
between subcircuits A^, A^*^ . The output of Cj is 
connected to an input of Gk in T iff an output of V is 
connected to an input of A^ in A^. 

Remark 6. It is not hard to show that since N\ \s a 

topological partition, Tis a DAG (i.e. T 'ls acyclic). 
Definition 20. Let Tbe the communication specification of 
circuit A^ with respect to a topological partition A^, A'''. 



Let Cj be the node of T corresponding to subcircuit A/. The 
longest path from an input of fto G^ is called the level of Cj 
and A^ (denoted by level{G) and level{N) respectively). 
Definition 21. Let A^,', ...,A^,'' and A^^', A^^*' be topological 
partitions of single output Boolean circuits A^,,A^j. Let 
communication specifications of A^, and A^, with respect to 
partitions A^,', A^,*" and A^^'. ^^2^ identical. Denote 
by CJ{N^,N^), m=\,...,k the correlation functions 
computed exactly as it was described in Section 3. 
Namely, we first compute correlation functions for 
subcircuits of level 1, then for level 2 and so on. The 
correlation function CJ{N^"',N^"') is obtained from the 
function H=Sat{v(N;)) a 5a/(v(A^™)) a C/*, the function 
Cy* being the conjunction of correlation functions for all the 
subcircuits A^,',A^2' whose outputs are connected to inputs of 
A^" , A^,". The function Cy(A^",Ar,") is obtained from H by 
existentially quantifying away all the variables except the 
output variables of N^"'^ N^. 

Proposition 7. Let / and/ be two Boolean functions that 
are toggle equivalent under input encoding function 
Enc(X,Y)^Enc(X^J,) a ... a EnciX^Js) (see Definition 14.) 
Let A^, and be circuits implementing/, and / and V and 
^be the output variables of A^, and A^,. Let CJ{V,W)hc the 
function obtained from the function Enc(X,Y) a Sat(v(NJ) a 
Sat^viN^)) by existentially quantifying away all the variables 
except variables of V and W, Then Cf{V,W) \sax\ encoding 
function. 

Proof. Assume the contrary i.e. CJ{V,W) is not an encoding 
function. Suppose, for example, that there are vectors v,v 'w 
such that v^v and CJ{v,w)^CJ{v\w). Since Cf 'xs obtained 
by existentially quantifying away some variables, there 
must exist vectors z=(xj',...,v,>v) and z ^(A:'y,...,v'w) 
such that 

1) input variables of N\ and A^2 are assigned x,y \nz and x\ 
y in z' and 

2) output variables of A^i and Ni are assigned v,w in z and 
v\w* in z'. 

Since v ^ v\ then x^ x . Hence there are vectors x^x'^yy 
such that a: Enc{xyj=Enc{x\ y')^\ , A^i(jc)=v,A^i(a:')^v' 
and N2{y)^w,N2{y')^w. But this means that and A^2 are 
toggle inequivalent and so we have a contradiction. 

Proposition 8. Let A^,', A^,'' and N^, be topological 

partitions of functionally equivalent circuits A^, and A^^ • Let 
communication specifications and of A^, and A^^ with 
respect to these partitions be identical i.e. T=T=T, Let 
each pair of subcircuits A^,"" and A^^"" be toggle equivalent 
under input encoding function Cf^ (see Definition 21). 
Then A^, and have a common specification 5. The 
topology of S is specified by the communication 
specification Tand A^,"" and A^^"" are implementations of 
/77-th block of S. 



Proof By assumption each pair of circuits Nx^.N-^ are 
toggle equivalent if their inputs are restricted by the 
corresponding correlation fianctions. According to 
Proposition 6, if the correlation functions are encoding 
ftinctions, then A^i'" and A^2"^ implement the same multi- 
valued function. So one just needs to prove that all 
correlation functions are encoding functions. Let 
/eve/(A^,'^)=/eve/(A^2"')=I. Then inputs of M"' and Ni^ are 
common primary inputs of Nx and A^2 • 0*^^ can view inputs 
of //i"" and A^2'" related by the encoding function that is 
the conjunction of functions describing equivalency of the 
corresponding input variables of A^i"" and A^2'"- According to 
Proposition 7, the correlation function relating outputs of 
A^i'" and A^2'" is an encoding function. Then by induction in 
topological levels one can easily prove that correlation 
function Cj{hix^,N'^) is an encoding function for any value 
of p. 

6. Amethodof HLLS 

In previous sections we developed some theory that 
allows one to check the correctness of a CS of circuits Nx 
and A^2 even if this specification is described implicitly. 
Suppose that A^i and A^2 have the same topological 
specification defined by subcircuits A^/, A^i'' and A^2'» — > 
A^2'^ and corresponding pairs of subcircuits are toggle 
equivalent. Then these two sets of subcircuits give an 
implicit representation of a CS of Nx and Ni. In this section, 
we use this result to formulate a method of High-Level 
structure aware Logic Synthesis (HLLS). 




Figure 2, An example of HLLS 

Let us describe this method by a simple example. 
Suppose circuit A^i is partitioned into three subcircuits A^i', 
A^i^, A^i^ as shown in Figure 2 on the lef^. Suppose that we 
want to obtain a circuit A^2 with a better performance than 
A^i. This can be done by replacing "tall" subcircuits Nx\ 
A^l^ A^i^ with "shorf and "wide" subcircuits Ni, N2, N2 
shown in Figure 2 on the right. In the method of HLLS this 
replacement is done in the following way. First we pick a 
subcircuit of topological level 1, say and synthesize a 



"short" subcircuit Ni that is toggle equivalent to The 
same procedure applies to the other subcircuit of 
topological level 1. The subcircuit A^,^ is replaced with a 
toggle equivalent subcircuit N2, Afler we are done with 
level 1, we move to topological level 2 i.e. to the subcircuit 

However, before the re-synthesis of circuit A^i we 
need to compute the necessary correlation functions. The 
inputs of A^i^ are connected to the outputs of circuits 
and N^ , so we need to compute the correlation fianctions 
CJ{Nx\Nx^) and CJiNx^.Ni)^ This computation is done as 
described in Section 3. Then we build a subcircuit A^2^ that 
is toggle equivalent to the subcircuit Nx^. The toggle 
equivalence of Nx^ and A^2^ is computed not "globally" (in 
terms of primary inputs of Nx and A^2) t*ut locally in terms 
of inputs of A^i^ and A^2^ related by the correlation fimctions 
Cj{Nx^ , N2) and Cy(//,^A^2^). Since subcircuits Nx^ and A^2^ 
have only one output, their toggle equivalence means that 
A^i^ and A^2"' (and hence circuits A^i and N2) are functionally 
equivalent modulo negation. Since A^i and N2 have the same 
topological specification and the corresponding subcircuits 
are toggle equivalent, N2 is a different implementation of 
the same three-block specification that is implemented by 

Of course, the described method gives only "the big 
picmre" because the procedure for synthesis of toggle 
equivalent circuits is not described. The generalization of 
the method to a circuit Nx with a specification of an 
arbitrary number of subcircuits is straightforward. The 
subcircuits of a specification of A^i are replaced with toggle 
equivalent subcircuits in topological order. Toggle 
equivalence of subcircuits Nx"^ and A^2'" is computed locally 
in terms of inputs of A^i*" and N{^ related by correlation 
functions obtained before. 

7. Relation of HLLS to other synthesis 
procedures 

In this section, we discuss the relation between HLLS and 
existing synthesis procedures based on local 
transformations. The main three differences between such 
procedures and HLLS are shown in Table 1 . The key idea of 
HLLS is to restrict the search to implementations of a 
predefined specification. The justification of such an 
approach is as follows. Suppose we need to optimize a 
combinational circuit A^i that has a high-level specification 
S represented as a partition of Nx into subcircuits Nx\ 
A^i''. One can view these subcircuits as implementations of 
multi-valued blocks Gi,.,(7k of S obtained by encoding 
values of variables of S with binary codes (however it does 
not mean that this encoding was performed explicitly). 

It is quite possible that the chosen codes are far fi-om 
optimal and so a much better implementation N2 of S than 



A^i may exist. However, a procedure based on local 
transformations, be it don't care based optimization [5] or 
SPFDS [6][7], will not be able to find N2. This is because to 
produce a different implementation of S one has to perturb 
many nodes of A^i at once (when replacing a subcircuit A^i"" 
with its toggle equivalent counterpart). On the other hand, it 
is unlikely that by local transformations one can produce a 
circuit (that is not an implementation of S) "better" than N2. 



Table I. Comparison of HLLS and procedures based on local 
transformations 



Synthesis based on 
local transformations 


HLLS 


Circuit is optimized 
node by node 


Many nodes are perturbed "at 
once" 


Equivalence che- 
cking is not an issue 


Reducing complexity of equi- 
valence checking is crucial 


The search space is 
not restricted 


One considers only imple- 
mentations of a predefined 
specificafion. 



Of course, the conjecture that the quality of encodings 
matters so much is based on the assumption that the 
specification S captures essential features of the circuit to 
be implemented. An example, of a circuit with a 
"meaningfiil" specification is a multiplier. On the other 
hand, if A^i has many specifications, sticking to one of 
them does not make much sense. 

It is not hard to see that HLLS and synthesis 
procedures based on local transformations are 
complementary and so can be used together. An HLLS 
procedure can be used as a first step of logic optimization 
meant to improve encodings of multi-valued variables and 
so to generate a better starting point for a procedure based 
on local transformations. 

8. A procedure for testing toggle equivalence 

Checking toggle equivalence is a key operation of the 
method of HLLS described in Section 6. In this section, we 
describe a procedure for testing toggle equivalence and give 
some experimental results. Let A^i and N2 be two multi- 
output Boolean circuits. Denote by inp{Nx),inp{N2) and 
out{Nx),out{N2) the set of primary input and output 
variables of circuits A^i and Ni^ Denote by 
Enc(inp(N\),inp(N2)) a function specifying allowable 
combinafions of inputs for A^i and A^2- Let A^*i and A^2 be 
copies of circuits N\ and N2 respectively that depend on 
different sets of variables (shown in Fig. 2.) Denote by H 
the fiincfion Sat(v(Ni)) a Sat(v(N2)) a 5a/(v(//*,)) a 
Sat{v(N*2)) A Enc(inp(N0Jnp{N2)) a 



Enc(inpiN*i)Jnp(N*2)y Denote by Eq{out(N\),out{N*i)) 
the function that is equal to 1 iff out(N\) and out(N*i) take 
the same value. Denote by Neq(out(N]),oui{N* i)) the 
ftinction that is equal to I iff out(Ni) and out{N*i) take 
different values. Denote by Hi the function H a 
Eq{oia(Ni),out{M*i)) a Neq{out{N2\oia{N*2)). Denote by 
H2 the ftinction H a Neq(out(NO,oui(N*x)) a 
Eq(ouliN2),out(N*2)y 

Proposition 9. Circuits A^, and A^^ are toggle equivalent iff 
the functions H^ and H^ (defined above) are constant 0. 
If part. Assume the contrary i.e. Hi and H2 are constants 0 
and A^i and A^2 are not toggle equivalent. Suppose there are 
vectors (xjt) and (a:*,/i*) such that A^i(jc) = Ni(x*) while 
A^2(/0 ^ N2(h*) where Enc(xJi)^Enc(x*,h*)^\, Then there 
is a vector w that satisfies the function H] (and so we have 
a contradiction). Indeed, let w be the set of assignments to 
the variables of circuits A^i,A^2,A^i*,A^*2 under the input 
assignment (xji,x*ji*). The assignment w satisfies 
Eq(out(Ni),out(N*0) because Ni(x) = M(a:*). It also 
satisfies Neq(oiit(N2),out{N*2)X because A^2(/0 ^ A^2(/^*)- 
Besides, w satisfies function H (because 
Enc(x,h)'=Enc(x*,h*)=\ and w is picked in such a way that 
it satisfies Sat{v(Ni)\ Sat(v(N2)) , Sat(v{N*i)) , 5a/(v(yV*2))- 

Only if part. Assume the contrary i.e. circuits A^i and A^2 
(and hence A^*i and A^*2) are toggle equivalent but either H\ 
or H2 (or both) is satisfiable for some assignment of values. 
Suppose for example that there is an assignment w such that 
//i(>v)=l. Let X, /i, x*Ji* be the parts of vector w that are 
assignments to the input variables of NuN2,N*],N*2 
respectively. Since vector w satisfies Sat(v(Ny)\ Sat{v{N2)), 
Sat{v{N*i)) , Sai(v(N*2))., then the rest of the assignments in 
w are consistent with xJt, x*,h* i.e. they are the values of 
gates of Ni,N2,N*],N*2 under the input vector xJtyX*,h* 
respectively. Besides, since Enc{x,h)= 

Enc(x*Jt*)=EqioutiNO,out(N*i))A Neq{out{N2\out{N*2)) 
= \ , then Ni{x) - A^i(jc*) while N2Q1) ^ AAj(^*). This means 
that A^i and N2 are not toggle equivalent and so we have a 
contradiction. 
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Figure 3 Two copies of circuits A^j and A^2 be checked for 
toggle equivalence 



In the experiments we used 39 circuits from the 
MCNC benchmark set. Namely, for each circuit A^i listed 
in Table 2 we checked its toggle equivalence with a circuit 
Ni produced from A^i by a random permutation of outputs. 
Clearly, permutation of outputs destroys functional 
equivalence of Ni and A^2 but preserves their toggle 
equivalence. To check if A^i and M2 are toggle equivalent 



Table 2. Toggle equivalence checking for MCNC circuits 



name 


#inputs 


#outputs 


time 
(sec , ) 


pclerS 


27 


17 


0 . 01 


frql 


28 


3 


0 . 03 


set 


19 


15 


0 . 04 


unreq 


36 


16 


0.04 


lal 


26 


19 


0.05 


c8 


28 


18 


0.07 


cht 


47 


36 


0.08 


b9 


41 


21 


0.09 


my adder 


33 


17 


0 . 16 


example2 


85 


66 


0 . 17 


C432 


36 


7 


0 . 18 


apex? 


49 


37 


0 . 18 


vda 


17 


39 


0 . 18 


ttt2 


24 


21 


0 . 33 


is 


133 


66 


0.40 


i6 


138 


67 


0 . 63 


terml 


34 


10 


0.74 


i7 


199 


67 


0 . 99 


i9 


88 


63 


0 . 99 


K2 


45 


43 


1 .46 


apex6 


135 


99 


1 . 58 


x4 


94 


71 


1.58 


x3 


135 


99 


1.70 


xl 


51 


35 


2.28 


C499 


41 


32 


2.41 


rot 


135 


107 


2.97 


C880 


60 


26 


5.07 


frq2 


143 


139 


5.13 


C1355 


41 


32 


6.48 


pair 


173 


137 


9.22 


des 


256 


245 


40.39 


C1908 


33 


25 


47.10 


too larqe 


38 


3 


70.9 


is 


133 


81 


150.02 


C5315 


178 


123 


193 . 10 


C3540 


50 


22 


261.28 


dalu 


75 


16 


310.67 


ilO 


257 


224 


588.87 


C7552 


207 


108 


6, 122 . 5 


Geometric mean 


1 . 84 


Arithmetic mean 


200.77 



we created CNFs describing functions H\ and H2 and 
checked them for satisfiability. (Since tested pairs N\,N2 



were toggle equivalent, all generated CNF formulas were 
unsatisfiable.) For satisfiability testing we used the SAT- 
solver BerkMin[l],[2]. The runtimes are shown in the last 
column of Table 2. It is not hard to see that in the majority 
of cases, toggle equivalence was established very quickly 
which proves that the proposed procedure may be used in 
HLLS. 



9. Conclusions 

We introduce a method of High-Level structure aware 
Logic Synthesis (HLLS). The key idea of the method is to 
re-encode multi-valued variables of the specification 
describing the high-level structure of the circuit implicitly 
at the gate level. We show that this can be done by 
preserving toggle equivalence among initial and re- 
synthesized implementations of multi-valued blocks. We 
show that toggle equivalence can be efficiently checked for 
relatively large pieces of combinational logic, which makes 
HLLS a promising direction for future research. 
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Abstract 

In this paper, we propose a new logic synthesis methodology to deal 
with the increasing importance of the interconnect delay in deep- 
submicron technologies. We first show that conventional logic syn- 
thesis techniques can produce circuits which will have long paths even 
if placed optimally. Then, we characterize the conditions under which 
this can happen and propose logic synthesis techniques which produce 
circuits which are "better'* for placement. Our proposed approach still 
separates logic synthesis from physical design. 

1 Introduction 

Conventional logic synthesis assumes that the delay of a circuit de- 
pends only on the delays of the gates in the circuit and mostly ig- 
nores the effect of interconnect delay. However, as we move to- 
wards smaller geometries, interconnect delay is becoming an increas- 
ingly larger fraction of the total delay. In fact. Semiconductor Indus- 
trial Alliance*s National Technology Roadmap for Semiconductor for 
1997 [1] predicts that interconnect delay will start dominating the total 
gate delay as we move down to 0.1 5;i technology and below. Another 
study by Keutzer. et al [5] shows that for Q.TSfj technology and below, 
interconnect delay can contribute anywhere from 50% to 80% of the 
total delay. Therefore, logic synthesis can no longer afford to ignore 
the eflfeci of interconnect delay during optimization. 

In this paper, we adopt a diametrically opposite approach to that of 
conventional logic synthesis. We perform logic synthesis to optimize 
only for interconnect delay, ignoring the effect of gate delays. Our 
approach is based on the simple observation that if an output o depends 
on an input i, then the best way to connect / to o is through a path which 
is monotonic from / to a, that is, there are no diversions in the path 
from ( to o. We first show, by means of an example, that conventional 
logic synthesis can produce a circuit for which it is impossible to find a 
placement with no diversions in the input-output paths. Therefore, no 
matter how good a place & route tool is. it may not be able to produce 
a circuit which is optimal in temis of interconnect delay. 

We define the notion of illegal nodes. Intuitively speaking, a node 
is illegal if it introduces a diversion in the circuit no matter where it is 
placed. We characterize the condition under which a node is illegal and 
give a procedure to convert an arbitrary circuit into a circuit which has 
only legal nodes. We call such a circuit a legal circuit We show that 
for a legal circuit, there always exists a point placement of the nodes 
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such that eveiy input-output path is monotonic. We also provide a set 
of logic synthesis transformations which are guaranteed to preserve 
the "legality" of a circuit 

The proposed approach has the advantage that it still maintains a 
distinction between the logic synthesis and place & route stages. It 
does not need to tightly couple synthesis and placement by frequently 
alternating between the two which can be inefficient and may not con- 
verge at all. 

2 Previous Work 

So far very little work has been done to model the effect of intercon- 
nect delay at the logic level. This is mainly due to the fact that at the 
logic level, very little information is availd)le about the interconnect 
Most of these approaches [9, 8, 14] use a rough companion placement 
to estimate the cost of various logic synthesis operations and make de- 
cisions based on this cost. In [13] an iterative approach to combine 
synthesis and placement is presented. Instead of using a companion 
placement to guide synthesis, they use actual placement which can be 
modified incrementally based on the netlist changes. In [15] a heuris- 
tic to minimize the layout cost is proposed which doesn't employ a 
companion placement solution. The method in [15] is based on mini- 
mizing the average fanout range and evenly distributing fanouts in the 
chip. It was shown that the chip delay could be reduced by this ap- 
proach if all the input pins are located on one side of the chip and all 
the output pins on the opposite. Like [15], our approach also does not 
employ a companion placement We analyze conditions under which 
a netlist is not "good" for placement given the locations of i/o pins and 
try to transfomi it into one which is. 

3 Preliminaries 

Definition 1 A logic circuit L is a S-tuple (/,0,ir). I is a set of 
primary input pins or simply primary inputs, O is a set of primary 
output pins or simply primary outputs. Each element of I and 0 is a 
binary variable. An element fj^ J is a function fj : BI'I i-j^ B. Each 
fl is called the global fimaion of the primary output oj, 

A logic circuit is represented by a Boolean network [3]. If njt is an 
inunediate fanout of nj in the Boolean network, we write nj n^. 
A logic circuit is pin-assigned if each primary input i is labeled with 
a position (j:,-,y,) and each primary output o with position (jr^,ya)- 
A logic circuit L is placed if every node n of the Boolean networic 
representing X has a position, i.e. every node n is labeled with (jCrt,>*n). 
and the resulting placement is denoted by IP^. A point placement of L 
is a placement of L where each node is represented as a point Given a 
point-placed circuit, a path, P{i^oy from a primaiy input / to a primary 
output o is a sequence of connected nodes from / to and the length 
of the path, d^i^^^y is the length of all the wires along the path from i 
to o. The path p^jo) is called monotonic if its length is equal to the 
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Manhattan distance from i to o. The placement of L is optimal if 
there is no other placement of L whose length of the longest path is 
shorter than that of T^, 

The coordinate system that we use in the paper assumes that the 
X-axis goes from left to right and they-axis goes from top to bottom. 

4 Problem Description 

Given a logic circuit A the goal is to find a placed circuit 9{c such 
that the interconnect delay of the circuit is minimized. Due to effi- 
ciency reasons, we want to maintain the decoupling of the problem 
into a separate synthesis phase followed by a place & route phase as 
in the conventional approach. Given a logic circuit L = (/, O, jT), we 
address the problem of finding a Boolean network !At, which when 
placed optimally, leads to a circuit with minimum interconnect delay. 
It is up to the placement tool to find the optimal placement for such 
a netwoik. Intuitively speaking, we are trying to create a circuit for 
which a "good" placement exists. 
We assume that the die is represented by a rectangle R with width 
and height and the given logic circuit is pin-assigned. We as- 
sume that the delay of a path is a linear frmction of iu length. In 
general, the interconnect delay depends quadratically on the length of 
the interconnect However, it can be made linear by buffer insertion 
and wire sizing, as shown in recent studies by Often and Brayton [7] 
and Cong and Pan [4]. A circuit is said to be optimal in terms of inter- 
connect delay if the length of a path from any primary input / to any 
primary output o is its Manhattan distance (mono tonic), i.e. 

^{i,o)-\xi-Xo\Myi-yo\ 

This definition is motivated by the pin-to-pin delay model of Kuki- 
moto and Brayton [6]. Under this model, a delay number is assigned 
for every input-output pair. This model is particularly suited for in- 
tellectual property (DP) blocks where the arrival time of the pins are 
not known in advance. Consequently, any input-output path can end 
up being a critical path. Therefore, to minimize the delay, we have to 
minimize the delay for all input-output paths. We call this problem the 
IP-based synthesis problem. 

We will also be addressing a slightly different problem called the 
slack-based symhesis problem, where the only difference from the IP- 
based problem is the objective function. Instead of minimizing the 
length of the path from any primary input to any primary output, we 
minimize the longest path of the circuit, i.e. 

min{ max dn^)} 

In this paper, we will mainly focus on the IP-based symhesis prob- 
lem. However, the approach can be modified to address the slack- 
based problem as weU. We will very briefly discuss this in Section 7. 



5 Approach 

To understand the problem better, let us first look at an example where 
the conventional logic synthesis which considers only gates during op- 
timization may not be able to find a circuit with minimum interconnect 
delay. 




Figure 1: Network 94ma and its optimal placement 

5.1 Logic Synthesis and Interconnect Delay: An Ex- 
ample 

Let us consider a minimum literal boolean network 9^ with 10 lit- 
erals as shown in Figure 1 on the left Assuming that the pin positions 
are given, the optimal placement of 5\^n is shown in Figure 1 on the 
right. Pins e and / are not shown and are assumed to be close to yi . 
In this solution, there are two longest paths of equal length, i.e. one 
path from btoyi and the odier from b to y2 . This circuit is not optimal 
in terms of both the IP-based and the slack-based synthesis problems 
because there is a better decomposition of the circuit that produces 
shorter longest paths. The better decomposed network with 1 1 literals 
is shown in Figure 2 together with its optimal placement Although 
network 9Qa has fewer literals than i^, it has an extra path &om b 
to y2- Consequently, the placement tool places node z to minimize the 
longest paths from bloyi and >:2- However, as we see in Figure 2, y2 
is independent of b and therefore, ^ can be removed firom the support 
of This leads to the network in Figure 2 whose optimal placement 
has shorter longest path as compared to I^q. 
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Figure 2: Network 9^ and its optimal placement 

Although network 9^ is better than iA^n in terms of both IP-based 
and slack-based synthesis problems, there is yet a better decomposi- 
tion for the IP-based syntiiesis problem. In iAf. the path from c to y\ 
is greater than its Manhattan distance. The same is true for the path 
from dXoy2. A better decomposed circuit with 1 1 literals and its 
optimal placement are shown in Figure 3. 
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Figure 3; Network iA£" and its optimal placement. 

From the example above, we see that sometimes the output of a 
logic synthesis is not "good" for placement, i.e. no matter how we 
place the nodes, there is at least one path which is longer than its Man- 
hattan distance. In our approach, the aim is to guide logic synthesis 
such that it produces a circuit which is good for placement. It is up to 
the placement tool to find the optimal placement for the decomposed 
circuit in the placement phase. 

In this section we define what we mean by a circuit which is "good" 
for placement and then give a set of transformation rules which can 
find such a circuit Our approach"can be divided into two broad stages: 
constraint generation and constraint driven synthesis. In the constraint 
generation step, we partition the die into regions and identify the types 
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of functions that are allowed to fill them. We define the notion of 
illegal nodes. Intuitively speaking, a node is illegal if it can not be 
placed somewhere on the die without causing a diversion in the circuit 
We show that if a circuit consists of only legal nodes then there exists 
a point placement of the nodes such that every input-output path is 
monotonic. We call such a circuit a legal circuit We characterize the 
condition under which a node is illegal and give a procedure to convert 
an arbitrary circuit into a legal circuit. 

Since nodes have areas, in the constraint driven synthesis step, we 
synthesize the legal circuit to find another legal circuit with mini- 
mum area. We extend the algebraic transfonnations and don't care 
minimization such that they operate on legal nodes and produce legal 
nodes. As in the conventional logic synthesis case, we use the number 
of factored-fonn literals as our area estimates since it has been proven 
to be a good indication of the size of a Boolean network. 

52 Constraint Generation 

Since the length of every path from a primary input to a primary output 
is restricted to its Manhattan distance (monotonic), there is a well de- 
fined region where a Boolean node can be placed. Let us define region 
formally. 



5^.1 R^on Placement Constraints 

The example above illustrates that if there is a path from a primary 
input / to a primary output o, then for the path to be monotonic, all the 
logic gates along the path should be placed in the region r^j^^y This 
leads us to first partition the die into rectangles along the pin positions 
and label each region with fimctions that can be placed in it Continu- 
ing with our example, the die area associated with yi ,y2»fl,^)C. and d 
is partitioned into regions it = {n , /i, ra, r4, } as shown in Figure 6. 
Region r\ is labeled with {a,b,c^d}y^ to mean that factors whose sup- 
ports are a subset of {a,^,c,d} and transitively fan out only to yi are 
allowed to be placed in n. Region ra is labeled with and 
{a^b}y^ to mean that factors whose supports are a subset of {c/d} and 
transitively fan out only to yi or factors whose supports are a subset 
of {atb} and transitively fans out only to yi are allowed to be placed 
in r^. Other regions are labeled in a similar fashion. Refering back to 
Boolean network we see that node z is a "good" node and can be 
placed in ri because its support set is {a, c, d} and it transitively fans 
out only to y\ . This matches the label of ri . Node jc is not a "good" 
node because there is no region whose label contains its support set 
{Cyd} and both of its transitive fanouts areyi andy2* 



Definition 2 A region r = {jc/,y,,Xr,y£^}, where jc/ < Xr and yt < yt,, 
is the set of all points in the rectangle bounded at opposite comers by 
the points (x/,y/) and (xryyt)- Mathematically, r = {(x,y) \xi <x< 
Xrandyt<y<yt}. 

Definitions Given two points p\ = (xi,yi) and p2 = (JC2,3^). the 
region defined by pi and p2 is region r^p^^p^) = {min(xp,,Xp2), 
^yp2)^^^ixpi ,Jfp2),niax(yp, ,yp^)}. 

With these definitions, we go back to analyze why node z of the 
Boolean network 9{! in Figure 2 is "good" but not jc. Because node z 
fans out to yi and its support set is {a, b^ c,d},z should be placed in the 
region r^.^ ^j, which is in Figure 4, so that the path from any primary 
input in the support set, i.e. a, b, c, or to yj is monotonic. For the 
same example, there is no good region to place node x because there 
are two conflicting requirements. One requirement says that node jc 
should be placed in region ''(j.,,c)» which is n in Figure 5, for the path 
from c to yi to be monotonic; while the other says that node jc should 
be placed in region r^^^y which is r2 in Figure 5, for the path from d 
to y2 to be monotonic. As shown in the figime, x can not be placed in 
both ri and r2. Hence, jc is not a desirable factor. 
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Figure 4: Legal region of node z. 
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Rgure 5: Conflicting legal region requirements forjc. 
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Figure 6: Regions and labels of regions. 



Definition 4 A placement constraint d is a 2'tuple (0^,c^), where 
O^CO,andcf^CL is called the output set and the support 

set of d. We also write d as {i\ , /2) • }oi ,02 where = {/ j , (2 > ■ - • } 

am/C>' = {oj,02,...}. 

Each region is labeled with a set of placement constraints, e.g. n is 
labeled with {a,i^,c,rf}jj and r^ is labeled with {c,i/}y, and {a,b}y2 
as shown in Figure 6. A placement constraint on a region r is called 
its region placement constraint. 

Hence, each region placement constraint dr = ((/ytf) in a region r 
denotes that Boolean nodes that fan out only to a subset of the primary 
outputs in (/ and have at most <f in their support can be placed in r. 

5.2.2 Node Placement Constraints 

We see that given a region r, only certain types of nodes can be placed 
in r and this is captured in its region placement constraint. We now 
define the dual for nodes. Given a node n, it can only be placed in cer- 
tdn regions. For example, node z of Boolean network 9^ in Figure 2 
can only be placed in region r\ as shown in Figure 6. Hence, we label 
each node with a placement constraint and it is called its node place- 
ment constraint. The node placement constraint of node n denotes the 
support of n and its transitive primary outputs. For example, the node 
placement constraint of z of Boolean network 5\f is {a,i>,c,«/}y, . 

The node placement constraints of nodes of a Boolean network can 
be easily computed by traversing the Boolean network in a breadth- 
first manner firom the primary inputs to compute the support sets and 
from the primary outputs to compute the output sets. 
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$23 Properties of Placemeat Constraints on Boolean Networks 

In this section, we show what "good" nodes mean and having a 
Boolean Network with only "good" nodes can lead to a monotonic 
point placement of the network. 

Intuitively, a "good" node is one that can be placed in a region. We 
define such "good" nodes as legal. However, before we can formally 
define the legality of a node, we need the definition of containment of 
placement constraints. 

Definitions Placement constraint da = (0',o*') is contained in 
placement constraint d^, ~ (0^,0*), denoted as dj C di,, if 0° C 
and&'CK^, 

Definition 6 Boolean node n with node placement constraint dn is 
legal with respect to region r with region placement constraints 
{drt . denoted asn^r, if there exists a j such that 4i Q dry 

Definition 6 says that node n is legal with respect to region r if n 
can be placed in r. 

Definition 7 A Boolean node n is legal if there is a region r such that 

Definition 7 says that node n is legal if there is a region r where n 
can be placed. This definition and Definition 6 are about the legality 
of a Boolean node. Now given a node, the next definition defines the 
region in which the node is legal. 

Definition 8 The legal regions of a node n, denoted as R(n), is the set 
of regions = {ri , /2, . . . , r/} such that for any region € ^ /i 4- 0'. 

For clarity purposes, we denote the legal region of a node n with 
node placement constraint d„ as R{d„), We will then assume that given 
a node placement constraint, the node is implicitly defined. 

It can be easily seen that Ri{ik}oi) is the region ^ we define 
R{d\ )nR(d2) to be the overlapping region between R{d{ ) and R{d2)f 
then it is easy to see that R({i\ , 12, . • • >'m}oi fiii'-foj is equal to: 

U ) n/?({*2}o, ) n " • n/?({i„,}a, ) n 
^({'1 }o2)nR{{i2}o2) n " • n/?({im}o,) n 

...n 

R({iy }oJ n/2({/2k) n . • . r\R{{i„uy 

This is called the intersection rule. For example, as shown in Figure 7, 
for node z of Boolean network 1^!'. 

R{z) = R{{a,b,c,d}y,) 

= ^({4>'. )n/?({% ) n/?({c},, ) nR({d}y, ) 
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Figure 7: Region intersection for node z of 

much smaller. However, there is a linear algorithm with complexity 
^(1^1 + 1^1) according to the next three lemmas. 

Lemma I below says that nodes that transitively fan out to only one 
output are always legal. 

Lemma 1 For a node placement constraint {ij , i2> . - . j tm}oi ,02,...,o„ 

Withn=\, /?({<!, /2,.. • 7^ 0. 

Proof: For Oi,/2> "i^m}c/. the point (x^^yc)/) is in 
12,... Wm}o,). ■ 
Lemma 2 below enumerates the cases when nodes that transitively 
fan out to two outputs are legal. 

Lemma 2 For a node placement constraint {I'l , i2j • > im }oi 
with m>2andn = 2, ,(2,.. . .02,...»oJ ^Oiff 

L (ViVo ;c/ > Xo Ay,- > y^) V (ViVo jc/ > A y, < yo) V {\fiio Xi < 
Ay,- > yo) V (V/Vo Xi <XoAyi< yo). or 

2. R{{iui2,,,.,im}ot)isapoint,i,€.Xai =Xo2 AViy,- = C, ory^i = 
yoj AV/X|- = Q for some C € 5\£. 

Proof: If part: 

1. Let us assume without loss of generality that (ViVo x,- > 

Ay,- > yo), and let /nun = (min{x,'},min{y/}) and Omax = 
(max{xo},max{y(,}). then the legal region is '*(,'^,o™») it 
is not empty. 

2. If the legal region is a point, then it is not empty. 

Only if part: Without loss of generality assume that the legal region 
is not empty and it is not a point, but x/^ < xbi < x/j, i.e. o\ is on the 
top side of the die, then R{{i{ ,i2}o, ) is a point if both /j and 12 are 
on the top side as well (Figure 8a); it is a line otherwise (Figure 8b). 
Since R{{i\ , 12 }o, ^2) = R{{ii , /2}oi ) A R{{ii , (2}o2). it is not empty iff 
y,-, =y,j and x©, =Xc2, i.e. and 02 are at opposite side (Figure 8c). 
If we have more than two inputs, then they all have to be either on 
the top or the bottom side of the chip for the legal region to be non- 
empty and the legal region has to be a point (Figure 8d). Hence, it is a 
contradiction. ■ 



Based on Definition 7, the legality of a node n with node placement 
constraint d„ = (0",o") can be checked by traversing all regions and 
check if n is legal for each region. Assuming |/| > the complex- 
ity for this algorithm is 0(|/p |0|) because the number of regions is 
0(1 /| |0|) and the number of region placement constraints in a region 
is 0(1 /| + 10|). A better algorithm would be to check if the legal re- 
gion of n is empty or not. This can be done by using the intersection 
rule defined above. The complexity is then 0( |0™ | |o" |), which can be 
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Figure 8: Figure for proof of Lenuna 2. 

The following lemma says if a node transitively fans out to more 
than two ou^uts, then there can only be one case where it is legal. 



29 



Lemma 3 For g node placement constraint {M,/2>-->^m}oi,o2,„.^« 
with m>X and n>% /?({/|,/2,.",/m}oi^,».,oJ # 0 iffi^lNoxi > 
XoAyi> yo ) V ( ViVo Xi >XoAyi<yo)y ( ViVo x.- <XoA yi > yo) V 
(ViVoX|<XoA>v<>'o). 

Proof: The proof is similar to the proof of Lemma 2. 
If part This is the same as the first case of the if part of Lemma 2 
proof. 

Only if part: Without loss of generality assume that the legal region 
is not empty but x-^ < x^, < x/j, i.e. oi is on the top side of the die, 
then R{{iiyi2}oi is a point if both I'l and 1*2 are on the top side as well 
(Figure 8a); it is a line otherwise (Figure 8b). Since R{{iuk}ouOi) = 
/?({M,'2}o,)Ai?({ii,i2}o2). it is not empty iffy,-, =y/j andx^, =Xoj, 
i.e. Oi and 02 are at opposite side (Figure 8c). There is no way to add 
a third output to {hik}ou02 with a non-empty legal region. Hence, it 
is a contradiction. ■ 

By the input-output symmetric nature of legal regions, the above 
three lemmas apply with the role of m and n interchanged. 

Let the condition (V/Vox/ > x^ Ay/ > y<,) V(ViV<?x/ > Xo Ay/ < y^) V 
(V(Vo X/ < Xo Ay,' > yo) V (ViVo x/ < x^ Ay/ < yo) be called the non- 
overlapping condition. Then, with these three lenunas, the legality of 
a node with node placement constraint {m ,i2»- • • >'m}ouo2,...,*>n 
checked with the following algorithm: 

1. If /I is 1, then the node is legal. 

2. If the non-overlapping condition is true, then the node is legal. 
TTiis can be checked in 0(m+n) by first finding the largest and 
smallest X and y coordinates of both inputs and outputs and then 
check for the overlapping condition using these values. 

3. If the node placement constraint satisfies Condition 2 of 
Lemma 2, then it is legal. 

4. If none of the above are satisfied, then the node is illegal. 

It is obvious that this legality checking algorithm is 0(m+n). Hence, 
it is very efficient 

Corollary 5.1 There exists a comer point pc of 
^({'l>'2>---)^m}ai ,02, ...,<?«) that is closest in distance to all out- 
putSt and a comer point pj farthest from all outputs. The point pc b 
called the closest point of the region and pj the farthest point, 

Lemma4 L If /e({/i,/2,.",/m}cno2,....aJ H /f({u}o„02 oj, 

where if: ^ {'i,*2>"-)^m}. w not empty, then it contains the 
closest point ofR{{ii ,12,^,,, im ,02.....o„ )- 

2. /?({<!, /2,..M'm}o,. 02 oJn/?({/i,/2t. y^here ojt ^ 

{oi,02,...,0n}, /5 m?r empty, then it contains the farthest point 

o/^({'l,'2,"M'm}o,,02 oj. 

Proof: Assume that m > 2 and /i > 2. The proof is similar for other 
cases. 

I. Assume (ViYo x/ > x^ Ay/ > yo) (the proofs of the other cases 
are the same), then xjt > Xo f\yk > yo- If JCjt is greater than the x- 
coordinaies of any other input, then 1 , /2, . . . , 'm }oi .02....,o J H 
^({'Jt}oi,02,.-.oJ = /?({/!, «2,'.M'm}a,,<>2,...,Dj If J* is less than 
the x-coordinate of all other inputs, then the vertical line going 
through ii: partitions i?({ii,/2)-M'm}o,.02,...,o«) into two regions 
and/?({/i,i2,...,/m}o,,cP2,..,oJn7?({u-}o,,02."..On) is the partition 
that includes the closest point. 



2. The proof is similar to case I . 

■ 

Lemma 4 says that: 

1 . Adding inputs to a node placement constraint will not change the 
closest point of its legal region. 

2. Adding outputs to a node placement constraint will not change 
the farthest point of its legal region. 

At this point, we have defined what legal nodes are and how to 
check for legality of nodes. We now put the legal context into Boolean 
networks and discuss the implication of legality of Boolean network 
on placement. 

Definition 9 A Boolean netwoHc is legal is every node in the network 
. is legal 

There is a nice property of a legal Boolean network as described by 
the following theorem. 

Theorem 5.1 Given a legal boolean network, there exists a mono* 
tonic point placement for the network. 

Proof: 

This is an induction proof. We traverse the Boolean network in a 
reverse topological order, i.e. a node is visited only after all its fanouts 
have been visited. 

The base case is where we have all primary outputs. Let o be an 
aibitrary primary output, then place o at its pin location. For 0, its pin 
location is its closest point. The induction hypothesis is that fanouts of 
a node n are placed at their closest points and still maintaining mono- 
tonicity, i.e. the distances from their closest points to their primary 
outputs are their Manhattan distances, we show that n can also be 
placed at its closest point while still maintaining monotonicity. 

Let /J/ be an arbitrary fanout of n. Let d be the node placement 
constraint of w/ with all fanins except n removed. Also let the node 
placement constraints of n and n/ be c and c/. Then c/ is derived 
from d by adding the primary inputs of fanins of /i/ other than n and 
c is derived from d by adding the primary outputs of fanouts of n 
other than /i/. We know that R{c!) ^ 0 because d Cc and R{c) ^ 0 
by the assumption that n is legal. By applying Lenuna 4 for each 
primary input added to c' to form c/, R{cf) includes the closest point 
of R{d)* Since R{c) C R{c?), the distance from the closest point of 
R{c) to a primary output o is the same as the sum of the distance from 
the closest point of R{c) to the closest point of R[cf) and the distance 
from the closest point of R{cf) and o. Hence, the monotonic property 
is maintained and n can be placed at the closest point of /2(c). 

■ 

Theorem 5.1 reduces our problem of finding a monotonic point 
placement of a circuit into the problem of finding a legal Boolean 
network. The logic synthesis transformations we use to convert an 
illegal Boolean networic into a legal one is called makeJegal, and it is 
explained below. 

5^.4 MakeXegal 

The makeJegal operation takes a Boolean network as its input and 
produces a legal Boolean network. In the effort of producing a legal 
Boolean network, it attempts to minimize the number of new Boolean 
nodes created. 
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The following lemma and corollary guarantee that a Boolean net- 
work can always be made legal. 

Lemma 5 Ifn^n/, and n is illegal butnj is legal, then collapsing n 
into nf will not make n f illegal. 

Proof: Collapsing nionj does not change the support ofnj, nor does 
it add any primary output to the transitive fanout of n/. Therefore^ the 
node placement constraint ofnj does not change and hence n/ stays 
legal. ■ 
By the proof of Theorem 5.1, we know that every primary output is 
legal. Then it is easy to see the following corollary. 

Corollary 52 An illegal Boolean network can always be made legal 
by collapsing all nodes into the primary output nodes. 

Beside collapsing, node duplication can also legalize a node. 

Lemma 6 If n ~¥ np n -¥ ng, and n is illegal but both nj and n^ are 
legal, then duplicating n into n\~^nf and ni «g makes n\ and /12 
legal. 

Proof: The support of /i is a subset of both the supports of n/ and 
ng, but the output set of the node placement constraint of n is a superset 
of the node placement constraints of both «/ and ng. By duplicating 
n into n\~^nf and n^ -> n^, node placement constraint of n\ is con- 
tained in that of ny and thus n\ is legal. Similarly for n2. ■ 

MakeJegal traverses the Boolean network in a reverse topological 
order, i.e. a node is visited after all its fanouts have been visited. Dur- 
ing the traversal, if it sees an illegal node, it coUapses the node into its 
fanouts until the node becomes legal. Hence, there is a frontier moving 
from each primary output to primary inputs in its support where every 
node is legal on the side of the frontier toward the primary output If 
the sum-of-product expression of the fanout, as a result of collapsing a 
node into one of its fanouts, exceeds a user-defined parameter, r, num- 
ber of literals, the node is replicated for each fanout until it becomes 
legal. The intuition behind this parameter is that large nodes tend to 
have more common subfunctions with other nodes and thus allow for 
sharing. However, the parameter should not be too large since it can 
result in explosion in memoiy usage. 

As shown above, legality of a node can be checked efficiently, that 
is, it is linear in the size of the node placement constraint. Hence, the 
makeJegal operation is effidenL 

53 Constraint-Driven Synthesis 

The constraint generation step takes a possibly illegal Boolean net- 
work and makes it legal. Theorem 5.1 guarantees that there exists 
a point placement for this network. However, by definition of the 
point placement of a circuit, nodes are assumed to be a point; hence, 
they have no area. In reality, nodes have area and the length of a 
longest path depends strongly on the size of a Boolean network. The 
constraint-driven synthesis step is responsible for minimizing the area 
of an already legal Boolean network while preserving its legality. As 
mentioned in Section 5, we use the number of factored-form literals of 
a Boolean network as a measure of the area of the circuit represented 
by the Boolean network. So this step is to optimize the network such 
that we get a minimum literal legal Boolean network. 

We leverage the well developed algebraic transformations in the 
conventional logic synthesis by extending them to deal with and pro- 
duce legal Boolean nodes. Each of these operations is explained be- 
low. 



53.1 FastJg:xtract 

Ttitfastjextract algorithm is explained in [16]. It basically looks for 
a two-cube divisor or a two-literal cube that reduces the most number 
of literals in every iteration. 

When dealing with legal Boolean network, this algorithm may re- 
sult in illegal divisors. For example, assume that node n is the best 
divisor found and it divides nodes x, y, and z. Then the output set of 
the node placement constraint of n is the union of the output sets of the 
node placement constraints of jc, y, and z* From Section 5.2, we know 
that the legal region of n may be empty and n may therefore be illegal. 
However, it may be the case that n remains legal if it only divides jc 
and y, or ;c and z, etc. Hence, the fastjextract algorithm is modified 
such that the best legal divisor is chosen in every iteration. 

If node n divides a set of nodes TV, then complexity of finding a 
subset Nt of N which preserves the legality of n and has the largest re- 
duction in the number of literals is exponential in the size of ^. Hence, 
a heuristic is used to select an optimal subset. First the nodes in ^ are 
ordered in decreasing sizes of the legal regions to form a list Nsorted- 
Then Nsorted is linearly traversed. Each node is added to the subset Ni 
if the legality of n is preserved. Node n is used as a divisor if it reduces 
the number of literals in the network. 

In this paper, the fastjextract implemented in SIS is used. 

53^ Resnbstitution 

In the conventional logic synthesis, a node n is resubstituted into an- 
other node xifn divides jc. This may affect the legality of both n and 
JC, The following observation states when n and x can become illegal. 

Observation 1 Jfn divides x and both n and x are legal before resnb- 
stitution, then ofter resubstitution 

L X can become illegal if its support is not the superset of that ofn. 

2. n can become illegal if its output set is not the superset of that of 

X. 

In this paper, n can only be resubstituted into jc if the legality of n is 
preserved. Hence, a check is made before every resubstitution. 

533 FulLSimpfily 

There are two types of don't cares, i.e. the observability don't cares 
(ODCs) and the satisfiability don't cares (SDCs). Computing the exact 
ODCs of a node is computationally expensive. In practice, a subset of 
the ODCs called the compatible ODCs (CODCs) are computed. These 
CODCs are expressed in terms of the primary inputs. Then together 
with the external don't cares (XDCs) of the primary outputs, a don't 
care set in terms of the iimnediate fanins is computed using an im- 
age computation. In computing the SDCs, a support filter is used. A 
node is included in the SDCs if its support set intersects the support 
set of the node being considered Employing SDCs in the minimiza- 
tion procedure can result in boolean resubstitutions. The support filter 
procedure can also be used in the image computation of the CODCs 
and XDCs. Once the SDCs are computed and the XDCs and CODCs 
are expressed in terms of immediate fanins, a two-level minimization 
algorithm is invoked to find an optimized expression. This is simply 
a brief description of the full^impUfy. For a more detail explanation, 
we refer the readers to 110]. 
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Lemma 7 Throughout fiill^implify computation^ the only steps that 
can introduce illegality into the networic are the image computation 
ami the SDC computation. 

Proof: Let node n be the node we are computing don't cares for. 
Legality of the Boolean network can only change if an edge is added 
to the network. During the whole full^implify process, only the fanin 
edges of n can be added. Edges of fanins of other nodes can not 
change. Adding a fanin edge to n means that a resubstitution happens 
and Observation 1 applies. Potential new fanin edges of /i are added 
only during tiie image computation and SDC computation through tiie 
support filter, which basically says that a node x is a potential divisor 
of n if the support of jc intersect the support of /i. ■ 

We Uienefore constrain this operation by allowing a node x to be 
in the support filter when computing fulLsimplify for node n if the 
inclusion of node jc preserves the legality of the network according to 
Observation 1. 

53.4 Synthesis Flow 

With all the above basic operations, a synthesis flow is then a script 
similar to the script.rugged in SIS. An empirical study needs to be 
conducted to derive an optimal script. 

6 Experimental Results 

To see the effect of the proposed approach, we have implemented the 
basic operations described in Section 5,3. An optimization script has 
been created and we call it scripLwire, which consists of: 

make.legal 
elindnate 5 
sweep; eliminate -1 
simplify -m nocomp 
eliminate -1 
sweep; eliminate 5 
simplify -m nocomp 
re sub -a 
fx 

resub -a; sweep 
eliminate -1; sweep 
full.simplify -m nocomp 

Our experiment uses SIS and Ritual version 3.4, a timing-driven 
standard cell placer [12]. The input blif file and a randomly gener- 
ated pad assignment file is read into SIS. The script.wire optimization 
script is run in SIS to generate an optimized logic netiisL The op- 
timized netlist is mapped to the standard cell technology library std- 
cell2.2,genlib of SIS. The mapped netiist is then placed by Ritual with 
a fixed pad assignment We measure the length of tiie longest paUi and 
die delay of the Rimal output. The distance of two cells is measured 
as die Manhattan distance from the center of botii cells. The lengtii of 
a patii is the sum of all distances between consecutive cells along tiie 
patii. 

Table 1 shows the results for four circuits. The clrcmt blnzraComb is 
obtained from the sequential circuit bbara by removing all latches and 
treating the outputs of the latches as primary inputs and tiie inputs to 
die latches as primary outputs of the network. Column 2, 3. and 4 show 
die number of literals in factored forms of tiie scripts scriptrugged, 
scripLdelay, and script wire respectively. Columns 5. 6, and 7 list tiie 
length of tiie longest patii for each script Columns 8, 9, and 10 show 



Uie CPU time. The experiments were run on a DEC AlphaServer 8400 
with 2GB of memory. The runtime is for the technology independent 
step. 

As shown in tiiis table, altiiough tiie number of literals in script.wire 
approach is more tiian that of script,rugged\ tiie lengtii of its longest 
patii is tiie same for rd53 and better in other circuits. The longest paths 
are much shorter than script,delay results. As seen from this table, tiie 
runtime is comparable. This is expected since tiie legality checking 
is linear in tiie size of the node placement constraints and hence its 
runtime is a minor part of the total runtime. 

Table 2 shows tiie delay computed by Ritual for the four circuits. 
Colunms 2, 3, and 4 show the cell delay for each script The wire delay 
is shown in columns 5, 6, and 7. The total delay is listed in colunms 8, 
9, and 10. Except for the total delay of z4ml running script.delay, tiie 
total delay of all circuits is tiie best using script.wire. 
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Figure 9; Number of literals vs number of nodes legalized for CI355. 

7 Open Issues and Future Work 

Though tiie results in the previous section shows that the approach per- 
forms satisfactorily, tiiese circuits are fairiy small. For bigger circuits, 
the number of nodes in a legal network can be large and optimizing 
such large networks using operations like fastjextract ondjuli^implify 
can be very expensive. 

To illustrate tiiis, we plot tiie number of literals versus the number of 
nodes in tiie constraint generation step for CI 355 as shown in Figure 9. 
On the X-axis is tiie number of illegal nodes that are legalized. On tiie 
y-axis is tiie number of literals in tiie Boolean network. The network 
increases from 1032 literals to 23709 literals after2I6 nodes have been 
legalized out of a total of 514 nodes in tiie network. 

There are three various directions that can be pursued to address this 
problem. The first one is to improve the area optimization algorithm 
presented in tiiis paper. Rewiring and redundancy removal is a tech- 
nique tiiat falls into this direction. SPFDs [2] can be used to minimize, 
rewire circuits, and potentially legalizing nodes. In tiiis paper, we are 
assuming that we are given a circuit represented as a Boolean network. 
We then apply makeJegal and several algebraic transformations fol- 
lowed by don*t care minimization. The final circuit depends on tiie 
quality of the initial Boolean network. Alternatively, the Boolean net- 
work can first be collapsed as much as possible into a two-level circuit 
where all primary outputs are expressed in terms primary inputs. Then 
functional decomposition, like [1 1], can be used to decompose tiie net- 
work into a minimum literal legal Boolean network. 

The second direction which we believe is more promising is to relax 
tiie constraint that every patii must be monotonic. In otiier words, this 
is about solving the slack-based synthesis problem instead of tiie more 
restrictive IP-based synthesis problem. This can be done by applying 
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Table I: Path length comparison of scripurugged, scripLdelay, and scriptwire for IP-based synthesis. 



Name 


Number of Literals 


Length of Longest Path 


CPU Time 


sc.rugged 


scdelay 


sc.wue 


so jugged 


scdelay 


sc.wire 


sc jugged 


scdelay 


sc.wire 


z4ml 


41 


84 


49 


1324 


1342 


1025 


0.2 


03 


03 


nl53 


42 


62 


50 


1122 


1624 


1122 


O.l 


03 


02 


rd73 


74 


178 


87 


1689 


2457 


1680 


0.8 


1.8 


1.2 


bbaraComb 


69 


79 


109 


2021 


1573 


1464 


0.5 


0.5 


03 



Table 2: Delay comparison of script.rugged, scripLdelay, and scriptwire for IP-based synthesis. 



Name 


Cell Delay 


Wire Delay 


Total Delay 


sc.rugged 


scdelay 


sc.wire 


sc jugged 


scdelay 


sc.wire 


sc jugged 


scdelay 


sc.wue 


z4ml 


5.66 


5.28 


4.78 


0.93 


1.03 


0.97 


6.59 


631 


5.75 


rd53 


9.73 


737 


5.94 


1.67 


2.13 


1.42 


11.40 


9.50 


736 


rd73 


7.01 


5.09 


5.59 


137 


0.88 


0.86 


838 


5.97 


6.45 


bbaraComb 


8.18 


622 


4.90 


2.19 


1.72 


1.08 


1037 


7.94 


5.98 



the IP-based synthesis algorithm only to a subset of the paths. Intu- 
itively» we can wireplan only the critical paths so that no diversions 
are allowed in them; other paths can have diversions. One approach 
would be to modify the definition of legality so that legality is checked 
based on the primary inputs and outputs that are relevant only to the 
critical paths. Only the nodes on the critical paths arc legalized. We 
have done some preliminary experiment and our results show that if 
you select top few longest paths and legalize all the nodes on those 
paths* then the area penalty is not veiy high. However, at present there 
is no easy way to perfonm a meaningful comparison of this approach 
(i.e. modified IP-based algorithm to solve the slack-based synthesis 
problem) with the conventional approach. For that, we need a place- 
ment tool that uses the same delay model as ours and we have not been 
successful at making Ritual use our model. 

One other issue that needs further attention is that of pin assigrunent 
The approach in this paper assumes that the pin assignment is given. 
In the design process, usually only partial pin assignment is given. 
However, the quality of the final solution strongly depends on the pin 
locations. Therefore, we need to look into algorithms to find good 
pin assignment during synthesis. Such an algorithm can also be used 
to extend this approach to handle sequential circuits by finding good 
placement for the latches present. 

Hie optimizations that we have shown are technology independent 
We have not yet addressed the issue of technology mapping. Also, 
we have completed ignored gate delays. We are presendy looking 
into both of diese issues, i.e. technology mapping and how to best 
incorporate gate delays in our approach. 

Finally we are also looking into extending the proposed approach 
to handle other interconnect issues, like crosstalk and reliability. 

8 Conclusions 

We have proposed a new approach to deal with the increasingly im- 
portance of wire delays in deep submicron technologies. It is based 
on the fact that the shortest path between any two points in a circuit is 
the Manhattan distance between them. We showed an example of why 
conventional logic synthesis may produce circuits where the minimum 
distance can not be achieved. 



The proposed approach still decouples logic synthesis phase and 
place & route phase. It consists of a constraint generation step which 
produces a legal Boolean network, which can be placed such that ev- 
ery path is monotonic, and a constraint-driven synthesis step which 
minimizes the legal Boolean network while preserving legality. We 
show an example of how this approach can be extended to solve the 
slack-based synthesis problem. Finally, we describe directions for fu- 
ture work which includes an investigation into a new placement tool 
that works together with the proposed approach. 
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